Using a PARTITION BY JOIN to Fill In The Gaps in Sparse Data

Hello

Trying to convert an Oracle SQL which uses "partition" in Left outer join into SQL server as shown below.

select v.table_name 
, v.program_no 
, v.column_name 
, a.creation_timestamp 
, a.record_key 
, a.before_value 
, a.after_value 
from audit_010_v v 
left outer join audit_data a partition by (table_name, creation_timestamp, record_key) 
on a.table_name = v.table_name 
and a.column_name = v.column_name ;
  1. AUDIT_010_V : This table holds what all columns and tables name the report needs for Audit

  2. AUDIT_DATA. : This table holds all the DML operation happen on a table column, basically table and column name and BEFORE and AFTER value for that column with timestamp DDL is as follow:

     CREATE TABLE AUDIT_010_V 
     (
     TABLE_NAME nvarchar(100),
     PROGRAM_NO nvarchar(100),
     COLUMN_NAME nvarchar(100),
     SEQ_NO nvarchar(100)
     );
    
     INSERT INTO AUDIT_010_V VALUES ('MSF203','A','BRANCH_CODE','1');
     INSERT INTO AUDIT_010_V VALUES ('MSF203','A','BANK_ACCT_NO','2');
     INSERT INTO AUDIT_010_V VALUES ('MSF203','B','STMT_CLOSE_DAY','3');
     INSERT INTO AUDIT_010_V VALUES ('MSF203','B','NO_OF_DAYS_PAY','4');
     INSERT INTO AUDIT_010_V VALUES ('MSF203','B','INV_STMT_IND','5');
     INSERT INTO AUDIT_010_V VALUES ('MSF203','B','SETTLE_DISC','6');
     INSERT INTO AUDIT_010_V VALUES ('MSF203','B','SETTLE_DAYS','7');
     INSERT INTO AUDIT_010_V VALUES ('MSF203','B','SET_DISC_FLAG','8');
     INSERT INTO AUDIT_010_V VALUES ('MSF203','B','FRT_DEFAULT','9');
    
    
     CREATE TABLE AUDIT_DATA 
     (
     TABLE_NAME nvarchar(100),
     COLUMN_NAME nvarchar(100),
     CREATION_TIMESTAMP nvarchar(100),
     RECORD_KEY nvarchar(100),
     BEFORE_VALUE nvarchar(100),
     AFTER_VALUE nvarchar(100)
     );
    
     INSERT INTO AUDIT_DATA VALUES
     ('MSF203','NO_OF_DAYS_PAY','09-JAN-2020 10.26.00','ARKL062784','30','60');
    
     INSERT INTO AUDIT_DATA VALUES
     ('MSF203','SET_DISC_FLAG','09-JUN-2020 11.06.00','MTNL020823',NULL,'R');
    
     INSERT INTO AUDIT_DATA VALUES
     ('MSF203','SETTLE_DAYS','09-JUN-2020 11.06.00','MTNL020823','0','20');
    
     INSERT INTO AUDIT_DATA VALUES
     ('MSF203','SETTLE_DISC','09-JUN-2020 11.06.00','MTNL020823','0','2');
    

So for a given table name and a Record Key, display all Columns List, populate Before/After a value for the one which exists in AUDIT_DATA table else leave NULL.

Also for the record where an entry doesn't exist in the AUDIT_DATA table, use CREATION_TIMESTAMP for backfill. On all the Grey cell columns there was no DML.

Output needed is as below:

I think Oracle deals better with sparse data. With SQL Server try DISTINCT and an extra JOIN.
Something like:

WITH AuditPartitions
AS
(
	SELECT DISTINCT TABLE_NAME, CREATION_TIMESTAMP, RECORD_KEY
	FROM AUDIT_DATA
)
SELECT V.TABLE_NAME 
	,V.PROGRAM_NO 
	,V.COLUMN_NAME 
	,A.CREATION_TIMESTAMP 
	,A.RECORD_KEY 
	,A.BEFORE_VALUE 
	,A.AFTER_VALUE 
FROM AUDIT_010_V V
	JOIN AuditPartitions P
		ON V.TABLE_NAME = P.TABLE_NAME
	LEFT JOIN AUDIT_DATA A
		ON V.TABLE_NAME = A.TABLE_NAME
			AND V.COLUMN_NAME = A.COLUMN_NAME
			AND P.CREATION_TIMESTAMP = A.CREATION_TIMESTAMP
			AND P.RECORD_KEY = A.RECORD_KEY
ORDER BY TABLE_NAME, COLUMN_NAME, CREATION_TIMESTAMP, RECORD_KEY;

Also, I would not make AUDIT_DATA.CREATION_TIMESTAMP nvarchar. Try something like datetime2(0) or datetime.

thanks, @Ifor for the answer, it works, but the "AUDIT_DATA " table contains lots of records hence the query takes lot of time the final return data.

Depending on you data distribution, CROSS APPLY might make a difference:

SELECT V.TABLE_NAME 
	,V.PROGRAM_NO 
	,V.COLUMN_NAME 
	,A.CREATION_TIMESTAMP 
	,A.RECORD_KEY 
	,A.BEFORE_VALUE 
	,A.AFTER_VALUE 
FROM AUDIT_010_V V
	CROSS APPLY
	(
		SELECT DISTINCT TABLE_NAME, CREATION_TIMESTAMP, RECORD_KEY
		FROM AUDIT_DATA D
		WHERE D.TABLE_NAME = V.TABLE_NAME
	) P
	LEFT JOIN AUDIT_DATA A
		ON V.TABLE_NAME = A.TABLE_NAME
			AND V.COLUMN_NAME = A.COLUMN_NAME
			AND P.CREATION_TIMESTAMP = A.CREATION_TIMESTAMP
			AND P.RECORD_KEY = A.RECORD_KEY
ORDER BY TABLE_NAME, COLUMN_NAME, CREATION_TIMESTAMP, RECORD_KEY;
1 Like

Thanks, @lfor this worked too but still very slow due to the volume of data in table AUDIT_DATA.

I suspect a lot of the slowness is due to the DISTINCT on AUDIT_DATA.

It might be worth working your way through the following article by Paul White as well as the referenced articles. Some of the ideas will only work with SQL2019 but you should be able to get some improvement with other versions of SQL.

1 Like