Slow Query

tharmen · December 10, 2016, 7:22am

Hi, We are running one query which takes one hour to finish. How can i make it faster? I have attached the execution plan below. Any idea? Hash match take 26% from overall plan. I could not attached full execution plan as the plan is very length and i did not allowed to upload .sqlplan files here.i have attached the part for hash match.

SELECT DISTINCT CustAbbr
	,LeadCount + Package Package
	,Device
	,SUM(Amount) AS Amount
	,Process
	,Field001 AS Site
	,'2016' AS Year
	,'11' AS Month
FROM (
	SELECT DISTINCT SUBSTRING(h.invoiceno, 1, 2) AS Prefix
		,h.CustAbbr
		,h.Device
		,s.LEADCOUNT
		,s.Package
		,SUM(h.TotalAmount) AS Amount
		,Process AS ProcessID
		,(
			CASE 
				WHEN (
						TempColForSummary LIKE '%ASSEMBLY%'
						OR ProcessType LIKE 'AAS%'
						OR ProcessType LIKE 'ADDER%'
						)
					THEN 'ASY'
				ELSE (
						CASE 
							WHEN TempColForSummary LIKE '%TURNKEY%'
								THEN (
										CASE 
											WHEN (
													Process LIKE 'ASSEMBLY%'
													OR Process LIKE '%GRIND%'
													OR Process LIKE '%WAFER PROBE%'
													)
												THEN 'ASY'
											ELSE 'TST'
											END
										)
							ELSE 'TST'
							END
						)
				END
			) AS Process
		,MAX(ndflag) NDFlag
		,MAX(bondtype) Bondtype
		,Field001
	FROM [S-SQL-ERPRPTDB].[carsemerp].dbo.[SA_INVASY&TST_RPT] h(NOLOCK)
	LEFT JOIN [S-SQL-ERPRPTDB].[carsemerp].dbo.[SSR_FG_Integration] s(NOLOCK) ON h.MESID = s.LotID
		AND (
			Command LIKE 'TST03%'
			OR Command LIKE 'TST2.5%'
			)
		AND SysID = (
			SELECT MAX(sysid)
			FROM [S-SQL-ERPRPTDB].[carsemerp].dbo.[SSR_FG_Integration] a(NOLOCK)
			WHERE a.LotID = s.LotID
				AND a.Command = s.Command
			)
	INNER JOIN [S-SQL-ERPRPTDB].[carsemerp].dbo.[DC_BOMHeader] d(NOLOCK) ON d.BOMID = s.PartID
	INNER JOIN [S-SQL-ERPRPTDB].[carsemerp].dbo.[SA_ASP_Trend] a(NOLOCK) ON a.Device = h.device
		AND a.CustAbbr = h.CustAbbr
		AND a.LEADCOUNT = s.LeadCount
		AND a.PACKAGE = s.Package
		AND NOT EXISTS (
			SELECT TOP 1 1
			FROM [S-SQL-ERPRPTDB].[carsemerp].dbo.[SA_ASP_Trend] b
			WHERE a.CustAbbr = b.CustAbbr
				AND a.LeadCount = b.LeadCount
				AND a.Package = b.Package
				AND a.Device = b.Device
				AND b.Year < '2016'
			)
		AND IndexID = (
			SELECT MAX(IndexID)
			FROM [S-SQL-ERPRPTDB].[carsemerp].dbo.[SA_ASP_Trend] c
			WHERE c.CustAbbr = a.CustAbbr
				AND c.Device = a.Device
				AND c.LeadCount = a.LeadCount
				AND c.Package = a.Package
			)
	WHERE InvoiceDate BETWEEN '2016-10-27 18:30:00'
			AND '2016-11-24 18:29:59'
		AND h.InvoiceNo NOT LIKE '%F'
		AND h.InvoiceNo NOT LIKE 'PD%'
	GROUP BY SUBSTRING(h.invoiceno, 1, 2)
		,h.CustAbbr
		,h.Device
		,s.LEADCOUNT
		,s.Package
		,Process
		,TempColForSummary
		,ProcessType
		,Field001
	) a
WHERE a.Bondtype = 'M'
	AND NDFlag LIKE '2016%'
GROUP BY CustAbbr
	,Device
	,LeadCount
	,Package
	,Process
	,Field001

Full SQL Plan at below URL
SQL Plan Click This To view the plan

bitsmed · December 10, 2016, 9:04pm

For all kind of reasons (especially if you care about correct result), you should avoid using nolock (see here).

Help us, help you by providing:

Table descriptions of all involved tables (in the form of create statements)
Sample data (in the form of insert statements)
Expected output from the sample data you provide

Ps.: Formatting your queries is good practice

jeffw8713 · December 11, 2016, 6:22pm

Without going into full details - you have the following issues:

DISTINCT and GROUP BY in the same query - one or the other should be used not both. Using both includes another sort operation that is not needed.
All tables are being accessed across a linked server - this would be much better done as a stored procedure on the other server and executing the stored procedure across the linked server to just return the results. Doing it this way - SQL Server may end up pulling all data across then linking or even creating a cursor and pulling a single row at a time.
The outer query is using both DISTINCT and GROUP BY but isn't needed because you can get everything from the inner query by changing the columns returned and using HAVING to limit the BondType and NDFlag.

The EXISTS statements on the JOINS are also something to consider - they can definitely cause additional overheard that may not be needed.

You also need to review the LIKE expressions where they have leading wildcards. These cannot utilize an index and will cause table/index scans.

Also - be aware that your date check will be missing any invoices that have a time in the last second before your end time. That is - you specify the end time as '2016-11-24 18:29:59' and any invoices where the InvoiceDate is '2016-11-24 18:29:59.003' through '2016-11-24 18:29:59.997' will be excluded.

Kristen · December 12, 2016, 11:03am

as @jeffw8713 said, but in case helpful the best way to deal with this is:

    InvoiceDate >= '20161027 18:30:00'
AND InvoiceDate <  '20161124 18:30'

i.e. "less than" one time-unit past the end point.

Note also that you should not have "-" in your date const strings as they will parse ambiguously - and that is open to interpretation depending on,. amongst other things, the language of the currently conected user.

I believe the hyphenated format is correctly parsed, by default, if the column is the newer DATETIME2 data type, but not the older DATETIME - which makes it even more confusing ...

I wonder if the JOIN with

SomeColumn = (SELECT MAX(AnotherColumn) ...)

would be better served using a ROW_NUMBER construction, restricted to ROW_NUMBER = 1, to get all desired columns from the first (i.e. MAX) row, rather than the subs-select to match the MAX(AnotherColumn)

Might also be possible to avoid the GROUP BY on a significant number of columns using ROW_NUMBER

Dunno how SQL will parse that - TOP 1 or TOP 11 ?

Ahh ... its in an EXISTS, get rid of the TOP I have no idea what SQL will make of that, hopefully the optimiser will ignore it, but if not it may well be a performance issue. Just use "SELECT 1" or "SELECT *" (traditionally the latter to allow SQL to choose any suitable column, for the optimal index choice)

h.InvoiceNo NOT LIKE '%F'

I would set up a persisted index on a VIEW that stored the "ends with" value for the InvoiceNo. This leading-wildcard will be a performance killer. As an alternative it MIGHT be possible to COVER it with an index on other columns important to this part of the query.

Personally I would want consistency on the use of "AS" and not. Particularly something like this:

SELECT
...
    LeadCount + Package Package

I would move the big CASE for [Process] to the outer select. The inner select is already grouping on the individual components that make it up (which might of itself, be a bug?) in which case having it in the inner select may well be using CPU for rows that are subsequently rejected in the OUTER Where clause.

tharmen · December 26, 2016, 2:34am

Hi bitsmed,
Done format the SQL.

tharmen · December 26, 2016, 2:39am

Hi Kristen,
Is that mean TOP 1 and Select 1 is the same concept with same output?

tharmen · December 27, 2016, 3:40am

Hi jeffw8713,

Thanks for your reply.

Actually all the table are in local SQL and there is no tables accessed across a linked server. It is just the developer write the query user servername in front? Is this can cause issue jeff? Hope you can enlighten me.

jeffw8713 · December 27, 2016, 3:48pm

If the tables are local - then don't use 4-part naming convention...it just confuses what the query is actually doing and doesn't help anything. It is not like you are ever going to move that code to another server - and use a linked server...

Please review all of the other suggestions as those are your key performance issues.

tharmen · December 28, 2016, 2:48am

Hi Jeffw8713,

Can you help to explain further on your below point 1 and 2 suggestion?
I also have attached the full execution plan in the post. Can you help to see the plan and advise?

jeffw8713 · December 28, 2016, 6:29pm

Why do you think it is necessary to perform a DISTINCT on a query that has GROUP BY and why are you using a derived table that has both DISTINCT and GROUP BY also where the ONLY differences are the fact that you are grouping the inner query based on a Prefix that isn't even used in the outer query?

Move the checks in the outer query to the inner query using HAVING instead...and remove the DISTINCT...

Your final query should just be the inner query as a GROUP BY without the Prefix in the group...

SELECT ... FROM ... GROUP BY ... HAVING max(ndflag) Like '2016%' AND max(bondtype) = 'M'

tharmen · December 29, 2016, 1:49am

Hi Jeffw8713,

The new sql which as below still having same execution plan and still same amount time(1 hour) to execute.

SELECT 
	 h.CustAbbr
	,h.Device
	,s.LEADCOUNT
	,s.Package
	,SUM(h.TotalAmount) AS Amount
	,Process AS ProcessID
	,(
		CASE 
			WHEN (
					TempColForSummary LIKE '%ASSEMBLY%'
					OR ProcessType LIKE 'AAS%'
					OR ProcessType LIKE 'ADDER%'
					)
				THEN 'ASY'
			ELSE (
					CASE 
						WHEN TempColForSummary LIKE '%TURNKEY%'
							THEN (
									CASE 
										WHEN (
												Process LIKE 'ASSEMBLY%'
												OR Process LIKE '%GRIND%'
												OR Process LIKE '%WAFER PROBE%'
												)
											THEN 'ASY'
										ELSE 'TST'
										END
									)
						ELSE 'TST'
						END
					)
			END
		) AS Process
	,MAX(ndflag) NDFlag
	,MAX(bondtype) Bondtype
	,Field001
FROM dbo.[SA_INVASY&TST_RPT] h(NOLOCK)
LEFT JOIN dbo.[SSR_FG_Integration] s(NOLOCK) ON h.MESID = s.LotID
	AND (
		Command LIKE 'TST03%'
		OR Command LIKE 'TST2.5%'
		)
	AND SysID = (
		SELECT MAX(sysid)
		FROM dbo.[SSR_FG_Integration] a(NOLOCK)
		WHERE a.LotID = s.LotID
			AND a.Command = s.Command
		)
INNER JOIN dbo.[DC_BOMHeader] d(NOLOCK) ON d.BOMID = s.PartID
INNER JOIN dbo.[SA_ASP_Trend] a(NOLOCK) ON a.Device = h.device
	AND a.CustAbbr = h.CustAbbr
	AND a.LEADCOUNT = s.LeadCount
	AND a.PACKAGE = s.Package
	AND NOT EXISTS (
		SELECT TOP 1 1
		FROM dbo.[SA_ASP_Trend] b
		WHERE a.CustAbbr = b.CustAbbr
			AND a.LeadCount = b.LeadCount
			AND a.Package = b.Package
			AND a.Device = b.Device
			AND b.Year < '2016'
		)
	AND IndexID = (
		SELECT MAX(IndexID)
		FROM dbo.[SA_ASP_Trend] c
		WHERE c.CustAbbr = a.CustAbbr
			AND c.Device = a.Device
			AND c.LeadCount = a.LeadCount
			AND c.Package = a.Package
		)
WHERE InvoiceDate BETWEEN '2016-10-27 18:30:00'
		AND '2016-11-24 18:29:59'
	AND h.InvoiceNo NOT LIKE '%F'
	AND h.InvoiceNo NOT LIKE 'PD%'
GROUP BY h.CustAbbr
	,h.Device
	,s.LEADCOUNT
	,s.Package
	,Process
	,TempColForSummary
	,ProcessType
	,Field001
HAVING max(ndflag) LIKE '2016%'
	AND max(bondtype) = 'M'

jeffw8713 · December 29, 2016, 5:56pm

Do you really need this or the other check for a specific invoice number? Is there maybe a better way of eliminating those types of invoices?

Review the outer join to SSR_FG_Integration...it probably doesn't need to be an outer join, but more importantly it looks like a good candidate to CROSS APPLY to get the appropriate correlated rows - and also look at the other section to see if that can be done with APPLY:

 Select h.CustAbbr
      , h.Device
      , s.LEADCOUNT
      , s.Package
      , sum(h.TotalAmount)  As Amount
      , Process             As ProcessID
      , Case When TempColForSummary Like '%ASSEMBLY%'
               Or ProcessType Like 'AAS%'
               Or ProcessType Like 'ADDER%'
             Then 'ASY'
             Else Case When TempColForSummary Like '%TURNKEY%' 
                       Then Case When Process Like 'ASSEMBLY%'
                                   Or Process Like '%GRIND%'
                                   Or Process Like '%WAFER PROBE%'
                                 Then 'ASY'
                                 Else 'TST'
                            End
                       Else 'TST'
                   End
         End As Process
      , max(ndflag)            NDFlag
      , max(bondtype)          Bondtype
      , Field001
   From dbo.[SA_INVASY&TST_RPT] h(nolock)
   Cross Apply (Select Top 1
                       *
                  From dbo.SSR_FG_Integration i (nolock)
                 Where i.LotID = h.MESID
                   And (i.Command Like 'TST03%' Or i.Command Like 'TST2.5%')
                 Order By
                       i.SysID desc) As s
   Inner Join dbo.[DC_BOMHeader] d(nolock)              On d.BOMID = s.PartID
   Inner Join dbo.[SA_ASP_Trend] a(nolock)              On a.Device = h.device 
    And a.CustAbbr = h.CustAbbr 
    And a.LEADCOUNT = s.LeadCount 
    And a.PACKAGE = s.Package 
    And Not Exists (     Select Top 1 1
                                   From dbo.[SA_ASP_Trend] b
                                  Where a.CustAbbr = b.CustAbbr
                                    And a.LeadCount = b.LeadCount
                                    And a.Package = b.Package
                                    And a.Device = b.Device
                                    And b.Year < '2016'
        ) 
    And IndexID = (      Select max(IndexID)
                                   From dbo.[SA_ASP_Trend] c
                                  Where c.CustAbbr = a.CustAbbr
                                    And c.Device = a.Device
                                    And c.LeadCount = a.LeadCount
                                    And c.Package = a.Package
        )
  Where InvoiceDate Between '2016-10-27 18:30:00'
        And '2016-11-24 18:29:59'
    And h.InvoiceNo Not Like '%F'
    And h.InvoiceNo Not        Like 'PD%'
  Group By
        h.CustAbbr
      , h.Device
      , s.LEADCOUNT
      , s.Package
      , Process
      , TempColForSummary
      , ProcessType
      , Field001
 Having max(ndflag) Like '2016%'
        And max(bondtype) = 'M'

No guarantees on the CROSS APPLY - something to try and something you should look at for SA_ASP_Trend.

tharmen · January 20, 2017, 3:01am

Hi Jeff,
The CROSS APPLY still the same and it add on more cost.
Is that you mean add more index in SA_ASP_Trend table?

jeffw8713 · January 20, 2017, 7:34pm

I did say there were no guarantees - it was a thought to maybe help improve performance.

tharmen · January 22, 2017, 1:23pm

Hi Jeff,
Yes i appreciate your effort in helping.
Is that any index will help to make the query faster?
I have attached the execution plan in my post.
Are you able to see the execution plan and advice me?

jeffw8713 · January 22, 2017, 3:24pm

As long as you have that like statement that begins with a wildcard - you are going to be scanning the full table. You could try a calculated column that is the reverse of the column InvoiceNo, index that calculated column and then change the filter to use that reversed column:

And h.InvoiceNoReversed Not Like 'F%'

I don't have any other recommendations - other than what has been stated previously. You have a join to SA_ASP_Trend that has a complex join criteria. Review that criteria and verify that it needs to be setup that way - see if you can simplify that portion. It isn't clear what columns are being returned and maybe you can convert that to cross apply on a query or create an inline-table valued function to just return the data you need.