Why execution plan high cost on clustered index scan and how to mimize cost of clustered index scan?

ahmedbarbary · May 17, 2022, 2:28am

I work on sql server 2019

i have clustered index scan 98 percent how to minimize it please

i have execution plean have high cost on clustered index scan
as

table i have issue on it

CREATE TABLE [Parts].[FMDMaster](
  [ID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
  [ChemicalID] [int] NULL,
  [HashSubstance] [nvarchar](3500) NULL,
  [HashMass] [nvarchar](3500) NULL,
  [StrSubstance] [nvarchar](3500) NULL,
  [StrMass] [nvarchar](3500) NULL,
  [strCASNumber] [nvarchar](3500) NULL,
  [strHomogeneousMaterialName] [nvarchar](3500) NULL,
  [strHomogeneousMaterialMass] [nvarchar](3500) NULL,
  [HashstrCASNumber] [nvarchar](3500) NULL,
  [HashstrHomogeneousMaterialName] [nvarchar](3500) NULL,
  [HashstrHomogeneousMaterialMass] [nvarchar](3500) NULL,
 PRIMARY KEY CLUSTERED 
 (
  [ID] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
 UNIQUE NONCLUSTERED 
 (
  [ChemicalID] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 ) ON [PRIMARY]
    
 GO
 /****** Object:  Index [IDX_ChemicalID]    Script Date: 5/17/2022 4:20:22 AM ******/
 CREATE NONCLUSTERED INDEX [IDX_ChemicalID] ON [Parts].[FMDMaster]
 (
  [ChemicalID] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
 GO

ahmeds08 · May 17, 2022, 7:36am

How many records does [Parts].[FMDMaster] have?

ahmedbarbary · May 17, 2022, 7:40am

6 milions rows

robert_volk · May 17, 2022, 12:44pm

Posting query to make it easier:

merge #tmpParts h
using Parts.FMDMaster f on h.strsubstances=f.StrSubstance 
 and h.strmass=f.strmass 
 and h.strcasnumber=f.strcasnumber 
 and h.strHomogeneousMaterialName=f.strHomogeneousMaterialName 
 and h.strHomogeneousMaterialMass=f.strHomogeneousMaterialMass
when matched then update set h.chemicalid=f.chemicalid ;

Based on the query plan that was posted, the temp table #tmpParts only has 2 rows in it. To avoid the clustered index scan, you would need to index all the columns on Parts.FMDMaster you're matching on:

CREATE INDEX multiColumns on Parts.FMDMaster(strsubstance
,strmass
,strcasnumber
,strHomogeneousMaterialName
,strHomogeneousMaterialMass)

You could probably reduce the size of the index by having only one of those columns indexed, and put the other 4 columns in the INCLUDE clause. Of course, since all your columns are declared NVARCHAR(3500) you'll get index creation warnings, and if any of those columns exceed 900 bytes you'll get an actual error, so this probably won't work. And since they're all NULLable too, the JOIN won't work correctly if there are NULLs.

A few other comments:

The clustered scan taking 98% of the query cost is only a relative measure. There's nothing to minimize or reduce here, it's the most expensive part of the query and always will be. Focus on reducing the overall execution time and I/O operations.
You don't need MERGE do to this UPDATE, just JOIN the temp table.
The IDX_ChemicalID index is completely unnecessary, you already have a unique constraint on that column, which will create an index as well. IDX_ChemicalID will just waste space.
A minor point, but since ChemicalID is already declared UNIQUE, you probably don't need the ID...IDENTITY column in the table. If you drop it, then you could shift the PRIMARY KEY to ChemicalID.
Your column definitions are wrong. NVARCHAR(3500) for everything is not a table design, it's a scrapbook for random information. JOINing on NVARCHAR(3500) declared columns will never perform well, especially with millions of rows, even if you could index it. Examine the data, find the correct data types and sizes for each column, then create a table with the proper structure, then index it correctly.

Apologies if the latter sounds harsh, especially if you can't change the table definition, but there's no point in asking how to improve its performance. It's untunable in its current state.