Dml operation optimization in sql server

I have a question about SQL Server. I have 3 tables :

Targettable
sourcetable
sourcetable2
Target table and source table have 35 columns and all are varchar datatype. I want Implement incremental concept (scd) type 2 in SQL Server using a MERGE statement.

Source table has data : 3 and half core records

Few off condition to implement scdtype 2 concept

I need fetch sourcetable1 table which are records not available sourcetable2 that records only
col4, col7, col8, col9, col10, col11 columns frequently change data we need to maintain history and latest
I have done these steps:

Handled null/empty values for all columns
I Created nonclustered index for col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11 Reason for nonclustered index to fetch records fastly
Apply merge statment to implement scdtype2 concept.
Query looks like this

INSERT INTO Test.[dbo].[targettable]
([col1],[col2],[col3],[col4],col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,
col17,col18,col19,col20,col21,col22,col23,col24,col25,col26,col27,col28,col29,col30,col31,col32,col33
,col34,col35)
SELECT
[col1],[col2],[col3],[col4],col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,
col17,col18,col19,col20,col21,col22,col23,col24,col25,col26,col27,col28,col29,col30,col31,col32,col33
,col34,col35
,[Flag],[RecordValidFromDate],[RecordExpiredDate]
FROM (
merge Test.[dbo].[targettable] target
using(
select * from
(select [col1],[col2],[col3],[col4],col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,
col17,col18,col19,col20,col21,col22,col23,col24,col25,col26,col27,col28,col29,col30,col31,col32,col33
,col34,col35
,
ROW_NUMBER()over(partition by col1 order by col5 desc,col6 desc)as rn
from Test1.[dbo].[sourcetable] a
where col2 !='820' and isnull(col3,'') !='')a

where a.rn=1 and not exists
(select 1 from Test1.[dbo].[sourcetable2] b where a.col1=b.pcol1)
) stag
on target.[col1]=stag.[col1]
when not matched then
insert ([col1],[col2],[col3],[col4],col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15,col16,
col17,col18,col19,col20,col21,col22,col23,col24,col25,col26,col27,col28,col29,col30,col31,col32,col33
,col34,col35,[Flag],[RecordValidFromDate],[RecordExpiredDate]
)values
( CASE WHEN coalesce(ltrim(rtrim(stag.[col1])),'') = '' THEN 'NA' ELSE ltrim(rtrim(stag.[col1])) END
,CASE WHEN coalesce(ltrim(rtrim(stag.col2)),'') = '' THEN '1800-01-01' ELSE ltrim(rtrim(stag.col2)) END
, CASE WHEN coalesce(ltrim(rtrim(stag.col3)),'') = '' THEN '1800-01-01' ELSE ltrim(rtrim(stag.col3)) END
, CASE WHEN coalesce(ltrim(rtrim(stag.col4)),'')= '' THEN 'NA' ELSE ltrim(rtrim(stag.col4)) END

     ------similary way I done upto 35 columns to hanndel empty and null value for all columns-----------------------------------------
    , CASE WHEN coalesce(ltrim(rtrim(stag.col35)),'') = '' THEN 'NA' ELSE ltrim(rtrim(stag.col35)) END
    ,'1',getdate(),'1800-01-01'
              )

when matched and target.flag=1 and
(CASE WHEN coalesce(ltrim(rtrim(target.col4)),'') = '' THEN 'NA' ELSE ltrim(rtrim(target.col4)) END
<>CASE WHEN coalesce(ltrim(rtrim(stag.col4)),'') = '' THEN 'NA' ELSE ltrim(rtrim(stag.col4)) END
or CASE WHEN coalesce(ltrim(rtrim(target.col7)),'') = '' THEN 'NA' ELSE ltrim(rtrim(target.col7)) END
<>CASE WHEN coalesce(ltrim(rtrim(stag.col7)),'') = '' THEN 'NA' ELSE ltrim(rtrim(stag.col7)) END
orCASE WHEN coalesce(ltrim(rtrim(target.col8)),'') = '' THEN 'NA' ELSE ltrim(rtrim(target.col8)) END
<>CASE WHEN coalesce(ltrim(rtrim(stag.col8)),'') = '' THEN 'NA' ELSE ltrim(rtrim(stag.col8)) END
or CASE WHEN coalesce(ltrim(rtrim(target.col9)),'') = '' THEN 'NA' ELSE ltrim(rtrim(target.col9)) END
<>CASE WHEN coalesce(ltrim(rtrim(stag.col9)),'') = '' THEN 'NA' ELSE ltrim(rtrim(stag.col9)) END
or CASE WHEN coalesce(ltrim(rtrim(target.col10)),'') = '' THEN 'NA' ELSE ltrim(rtrim(target.col10)) END
<>CASE WHEN coalesce(ltrim(rtrim(stag.col10)),'') = '' THEN 'NA' ELSE ltrim(rtrim(stag.col10)) END
or CASE WHEN coalesce(ltrim(rtrim(target.col11)),'') = '' THEN 'NA' ELSE ltrim(rtrim(target.col11)) END
<> CASE WHEN coalesce(ltrim(rtrim(stag.col11)),'') = '' THEN 'NA' ELSE ltrim(rtrim(stag.col11)) END
)
then update set target.flag='0',target.[RecordExpiredDate]=getdate()
output $ACTION ACTIONOUT,
stag.[col1],stag.[col2],stag.[col3],stag.[col4],stag.col5,stag.col6,stag.col7,stag.col8,stag.col9,stag.col10,stag.col11,stag.col12,stag.col13,stag.col14,
stag.col15,stag.col16,
stag.col17,stag.col18,stag.col19,stag.col20,stag.col21,stag.col22,stag.col23,stag.col24,stag.col25,stag.col26,stag.col27,stag.col28,stag.col29,
stag.col30,stag.col31,stag.col32,stag.col33,stag.col34,stag.col35,'1' as flag,getdate() as RecordValidFromDate,'1800-01-01' as [RecordExpiredDate]
)AS MERGE_OUT
WHERE MERGE_OUT.ACTIONOUT='UPDATE';

When I ran above query For first time it is executed with in 3 minutes. After that I ran it the 2nd time it takes more than 2 hours, and some time its 1 hour 50 minutes.

Could you please tell me where I need to change query or what step I need to implement in the above query to give best performance and reduce execution time in sql server.

Some observations:

CASE WHEN coalesce(ltrim(rtrim(stag.col35)), '') = ''

is awkward and not SARGable. why not simply:

CASE WHEN stag.col35 = '' or stag.col35 is null

for all those case statements. Note that trimming leading and trailing spaces before comparing to an empty string has no effect. That is, SPACE(any length) = ''

See here for reference

Try to avoid expressions on the left side of comparisons that include columns in the row set, since they prevent SQL Server from using any indexes that may include those columns.

Speaking of columns, which ones are indexed? Hopefully those in ON clause of the MERGE statement at least. Preferably those in the WHERE clause as well

1 Like

I know that .. in fact I've known that since I was in short trousers ... :smile:

So WHY oh WHY oh WHY have I been writing this ALL THESE YEARS? :frowning: in order to convert a parameter from "blank" to "null"

SELECT @MyParameter = NullIf(RTrim(@MyParameter), '')

when quite clearly this would work just fine

SELECT @MyParameter = NullIf(@MyParameter, '')

How many RTrims have I typed unnecessarily? :frowning: :frowning: :frowning:

One thing is for sure ... I'm not going to do it again!!

about the performance could it be this:

  1. first time through, target table is empty and the INSERT branch is executed
  2. subsequently, many rows match, the Update with the non-SARGable predicates executes. Each row has to be fetched and compared

I'd use CHECKSUM() to check for differences rather than a (very) long list of <> conditions.

+1 Amen to that!

above give comment not given performance.could you please tell me any alternate solution to give more optimization for above given query

I'm a bit leery when it comes to the CheckSum function of SQL Server. Here's a small example of why.

 SELECT  CS1 = CHECKSUM('A352KD')
        ,CS2 = CHECKSUM('A352NT')
;

I've not tested BINARY_CHECKSUM to any great extent simply because of the same warning in Books Online that there is for CheckSum.

BINARY_CHECKSUM(), computed on any row of a table, returns the same value as long the row is not subsequently modified. BINARY_CHECKSUM() will return a different value for most, but not all, changes to the row, and can be used to detect most row modifications.

Me too.

I suppose you could use it to quickly get to the rows that have different checksum ... but what about catching the ones that have same checksum but have changed? Still got to do that monster WHERE clause on all columns ... on all rows.

We have very strictly enforced UpdateDateTime on all tables, so "more recent date/time since the last compare" reduces the number of rows to be checked, but that's the only SpeedUp that I have ever relied on.

fwiw I also use SSIS SCD transformation on complicated cases. If for nothing else, just so I don't have to type a long WHERE clause

We have a snippet of SQL to mechanically generated those - we basically only need two templates - one for comparison of Varchar/Strings and another for Number, Date, etc.

So do I! It's called SSIS

1 Like

For TYPE II SCDs, I don't actually care what, if anything, changed. If it was an INSERT, I record the data with a '99990101' end date. If someone does an UPDATE, I don't actually care if a value changed... I record the new row with a '99990101' end date and change the end date on the old row to the current date a time. No comparisons required.

Does that sometimes cause precisely duplicated rows to be recorded? You bet it does and I thankfully have full record of them all so that I can tell whomever provided the data that they need to get their act together and then provide them with the proof. Better than all of that, it keeps me from having to stand up an SSIS box (sorry Gerald) :wink: