Help with Index to Cover my query

Kristen · September 1, 2015, 1:59pm

I'm trying to speed up a query in a 3rd party app. I cannot change the SQL, so Indexes is all I have available.

SELECT min(MyID), max(MyID)  
FROM SomeTable
WHERE
(
	    T_ALIAS = N'MIME'
	AND VERSION = 1
	AND (
		   TYPE = N'D'
		OR TYPE = N'0'
		OR TYPE = N'1'
		OR TYPE = N'2'
		OR TYPE = N'3'
		OR TYPE = N'4'
		OR TYPE = N'5'
		OR TYPE = N'6'
		OR TYPE = N'7'
		OR TYPE = N'8'
	)
	AND
	(
		    ENTRYWHEN >= '20150801 00:00:00' 
		AND ENTRYWHEN <= '20150831 23:59:59'
	)
)

There are basically two flavours of this query, one is for a single day, the other (this one) for a whole month.

The single-day one I have successfully created an index for (based on a Suggestion made by SSMS).

|--Stream Aggregate(
	DEFINE:(
		[Expr1002]=MIN([MyDB].[MySch].[MyTable].[MyID] as [MyTable].[MyID]), 
		[Expr1003]=MAX([MyDB].[MySch].[MyTable].[MyID] as [MyTable].[MyID])
	))
|--Index Seek(OBJECT:([MyDB].[MySch].[MyTable].[VersionAliasEntrywhen] AS [MyTable]), 
	SEEK:([MyTable].[VERSION]=(1) 
		AND [MyTable].[T_ALIAS]=N'MIME' 
		AND [MyTable].[ENTRYWHEN] >= '2015-08-31 00:00:00.000' 
		AND [MyTable].[ENTRYWHEN] <= '2015-08-31 23:59:59.000'),  
	WHERE:(CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'D' 
		OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'0' 
		OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'1' 
		OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'2' 
		OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'3' 
		OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'4' 
		OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'5' 
		OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'6' 
		OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'7' 
		OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'8'
	) ORDERED FORWARD)

The VersionAliasEntrywhen index is:

CREATE NONCLUSTERED INDEX [VersionAliasEntrywhen]
ON [MySch].[MyTable]
(
	[VERSION],[T_ALIAS],[ENTRYWHEN]
)
INCLUDE
(
	[MyID],[AUTHOR],[OPERATOR],[DEFAULT_SECURITY],[IS_SECURED],[TYPE]
)

For the query spanning a whole month I haven't been able to find an index that it will use, it just does a Clustered Index Scan. The Clustered Index is:

ClustIDX : clustered, unique located on PRIMARY, Columns : MyID, VERSION

|--Nested Loops(Inner Join)
|--Stream Aggregate(DEFINE:([Expr1002]=MIN([MyDB].[MySch].[MyTable].[MyID] as [MyTable].[MyID])))
| |--Top(TOP EXPRESSION:((1)))
|  |--Clustered Index Scan(OBJECT:([MyDB].[MySch].[MyTable].[ClustIDX] AS [MyTable]),
	WHERE:([MyDB].[MySch].[MyTable].[VERSION] as [MyTable].[VERSION]=(1) 
		AND [MyDB].[MySch].[MyTable].[ENTRYWHEN] as [MyTable].[ENTRYWHEN]>='2015-08-01 00:00:00.000' 
		AND [MyDB].[MySch].[MyTable].[ENTRYWHEN] as [MyTable].[ENTRYWHEN]<='2015-08-31 23:59:59.000' 
		AND [MyDB].[MySch].[MyTable].[T_ALIAS] as [MyTable].[T_ALIAS]=N'MIME' 
		AND (
		       CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'D' 
		    OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'0' 
		    OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'1' 
		    OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'2' 
		    OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'3' 
		    OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'4' 
		    OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'5' 
		    OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'6' 
		    OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'7' 
		    OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'8'
		)) ORDERED FORWARD)
|--Stream Aggregate(DEFINE:([Expr1003]=MAX([MyDB].[MySch].[MyTable].[MyID] as [MyTable].[MyID])))
  |--Top(TOP EXPRESSION:((1)))
   |--Clustered Index Scan(OBJECT:([MyDB].[MySch].[MyTable].[ClustIDX] AS [MyTable]),
	WHERE:([MyDB].[MySch].[MyTable].[VERSION] as [MyTable].[VERSION]=(1) 
		AND [MyDB].[MySch].[MyTable].[ENTRYWHEN] as [MyTable].[ENTRYWHEN]>='2015-08-01 00:00:00.000' 
		AND [MyDB].[MySch].[MyTable].[ENTRYWHEN] as [MyTable].[ENTRYWHEN]<='2015-08-31 23:59:59.000' 
		AND [MyDB].[MySch].[MyTable].[T_ALIAS] as [MyTable].[T_ALIAS]=N'MIME' 
		AND (
			   CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'D' 
			OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'0' 
			OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'1' 
			OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'2' 
			OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'3' 
			OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'4' 
			OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'5' 
			OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'6' 
			OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'7' 
			OR CONVERT_IMPLICIT(nchar(1),[MyDB].[MySch].[MyTable].[TYPE] as [MyTable].[TYPE],0)=N'8'
		)) ORDERED BACKWARD)

Here are some Stats to indicate how selective, or not!, the various columns are:

669,025 COUNT(*)
     37 DISTINCT T_ALIAS
428,093 T_ALIAS = 'MIME'
     71 DISTINCT VERSION
668,282 VERSION = 1
      3 DISTINCT TYPE
646,632 TYPE = 'D' or '0' - '8'
546,108 DISTINCT ENTRYWHEN
  8,030 (1.2%) ENTRYWHEN within range

Column datatypes:

MyID      float
ENTRYWHEN datetime
VERSION   int
ALIAS     nvarchar(64)
T_ALIAS   nvarchar(64)
TYPE      char(1)

SSMS has no suggestions for indexes. I've tried an index with ENTRYWHEN column first, but that was not used.

Kristen · September 1, 2015, 2:02pm

For the single-day query I get:

Table 'MyTable'. Scan count 1, logical reads 4
   CPU time = 0 ms,  elapsed time = 0 ms.

For the one-month range I get:

Table 'MyTable'. Scan count 2, logical reads 72,263
   CPU time = 687 ms,  elapsed time = 699 ms.

and elapsed time is typically around 1.2 seconds

gbritton · September 1, 2015, 2:20pm

You may not be able to change the query but you can change the query plan using plan guides

ScottPletcher · September 1, 2015, 3:49pm

Is the Version always 1? Or does its value vary also? For now I'll assume it's the same, so it won't figure into the comments below. If it turns out to be more significant, naturally that could affect index recommendations.

That clustered index looks useless (assuming MyID is an identity column). Based on general experience, it's extremely likely that the unique clustered index should be on ( ENTRYWHEN, MyID ).

The condition to test ENTRYWHEN should be coded like this, as always for date/datetime[s] types:

    ENTRYWHEN >= '20150801'
AND ENTRYWHEN <  '20150901'

Btw, get rid of the Unicode (N') on the strings. Never use Unicode strings on literals unless they're required to be used.

Kristen · September 1, 2015, 4:36pm

Never had to implement them before, because in my own code I can change the source of course. Thanks for that. I tried a forced index hint in my Test Rig and adding the preferred index for the one-day range

WITH (INDEX (VersionAliasEntrywhen))

changes the monthly query from this:

Table 'MyTable'. Scan count 2, logical reads 72,263
   CPU time = 687 ms,  elapsed time = 699 ms.

to this

Table 'MyTable'. Scan count 1, logical reads 44
CPU time = 0 ms,  elapsed time = 3 ms.

so if I can figure out how to attach a Plan Guide to this query I might just be able to revolutionise its performance!

I'm just in the process of figuring out how the raw SQL is actually sent to SQL Server - I think it might be using a PREP SProc - which should be good news in the sense that SQL will be seeing it as already parametrised.

Your suggestion also caused me to stumble over an article about using Plan Guides to template dynamic SQL. There is one particular query which contains an IN() list with (say) 50 items in it. It always has 50 items (and then a second/subsequent query follows with the next 50) and of course each one gets its own Query Plan. I imagine that the "lifetime" of these only-used-once query plans will be short, but its still a whole load of Cache Buffer Management Effort, so if I can force that query to parametrise it might cut down on Cache Buffer memory churn ...

I'll report back - thanks for the idea though, I might even have to buy a book so I can become an expert!!

(there are some STATS in my first post)

[VERSION] has 71 Distinct Values, and there are 668,282 rows (out of 669,025) where VERSION = 1

It seems useless to me (in terms of selectiveness) and yet the index recommended by SSMS was VERSION, T_ALIAS ,ENTRYWHEN.

I had already tried my own non-clustered index, with ENTRYWHEN first (and covering all necessary columns). My index was used by the query too, but SSMS's recommendation was faster (few reads etc.) - I was surprised it selected it given the lack of selectiveness of BOTH VERSION and T_ALIAS, but I suppose VERSION + T_ALIAS and then ENTRYWHEN was going to find 100% of the matched index entries were then used in the query. (Actually not quite true, as TYPE was an INCLUDE column, but that has stacks of ORs in the WHERE)

I haven't rushed to change the Clustered Index as I wanted to first check if ENTRYWHEN is updated at all, and what other FKeys / JOINs are commonplace. But I agree that looks likely.

Nothing I can do about that - its a 3rd party app I was horrified that it uses

ENTRYWHEN <= '20150831 23:59:59'

as clearly it will be forever ignoring anything from 23:59:59 to 00:00:00

Can't change that either ... but I wasn't aware that it was bad?? BUT the underlying column is itself CHAR so is being implicitly cast to NChar to THEN be compared with the NChar Constant How completely and utterly rubbish is that use of N'MyConstant' syntax

This is a big, household-name, APP and I, for one, had expected a far higher level of coding skill in its construction. Very disappointing. It wasn't cheap to buy either ...

Kristen · September 1, 2015, 4:40pm

P.S. Another thing I found "odd", unless you disagree?, is that the APP is using a FLOAT for the IDENTITY column

Might be a side-effect of multi-platform support perhaps? I imagine that this APP also ships on Oracle etc.

ScottPletcher · September 1, 2015, 4:58pm

Yea, using float is just awful for an identity/counter value.

Yes, a custom covering index -- which is effectively a custom table -- will often have fewer overall reads than even the best clustered index. But, you don't want to have to create covering indexes for virtually every query for them to perform adequately. That's the importance of getting the best clustered index. The clustered index buffers also can be shared among all queries, covering indexes cannot.