Indexes ineffective on where clauses

LockeGarmin · March 22, 2017, 5:05pm

I started to look more into indexing after reading this forum post and watching the videos. I set up some test tables with differing indexes and ran the query execution plan and was really surprised to find that indexes don't seem to have any effect when you're running items in the where clause. The main point of my experiment was to test indexes with 'included' data (Test3), but what really surprised me was the table Test4 being just as effective as the other Test tables. Shouldn't my query with Test4 at least be somewhat faster with the index on Name? Maybe I'm worrying too much about the execution plan percentages? I'll watch those videos again in case I missed something that would clear this up but if anyone has any other insight in creating or using indexes I would be love to hear what I'm doing wrong. Below are the tests I referenced in my question. Thanks!

IF OBJECT_ID('Test1', 'U') IS NOT NULL DROP TABLE Test1;
CREATE TABLE Test1
(
    ID INT IDENTITY PRIMARY KEY,
    Name VARCHAR(32),
	Number INT,
);
GO

IF OBJECT_ID('Test2', 'U') IS NOT NULL DROP TABLE Test2;
CREATE TABLE Test2
(
    ID INT IDENTITY,
    Name VARCHAR(32),
	Number INT,
);
GO

IF OBJECT_ID('Test3', 'U') IS NOT NULL DROP TABLE Test3;
CREATE TABLE Test3
(
    ID INT IDENTITY,
    Name VARCHAR(32),
	Number INT,
);
CREATE INDEX IX_ID_Name_Number ON Test3 (ID) INCLUDE (Name, Number);
GO

IF OBJECT_ID('Test4', 'U') IS NOT NULL DROP TABLE Test4;
CREATE TABLE Test4
(
    ID INT IDENTITY,
    Name VARCHAR(32),
	Number INT,
);
GO

CREATE INDEX IX_Name ON Test4 (Name);
GO

SELECT Name FROM Test1 WHERE Name IN ('Test1', 'Test2', 'Test3'); --25% query cost (Clustered Index Scan, Select)
SELECT Name FROM Test2 WHERE Name IN ('Test1', 'Test2', 'Test3'); --25% query cost (Table Scan, Select)
SELECT Name FROM Test3 WHERE Name IN ('Test1', 'Test2', 'Test3'); --25% query cost (Index Scan, Select)
SELECT Name FROM Test4 WHERE Name IN ('Test1', 'Test2', 'Test3'); --25% query cost (Index Seek, Select)

JamesK · March 22, 2017, 5:29pm

Insert some data into the tables and try again. Be sure to insert sufficiently large number of records; if there are only a handful of records, query optimizer might decide that a table scan is better.

LockeGarmin · March 22, 2017, 5:42pm

Ah, you're correct. I inserted 100,000 records (Name and number field would both contain several duplicates) and sure enough Test4 dropped to 0% while the other three were tied at 33%. Good to know the Include (Name) on the index won't help my performance on querying the name field. I didn't realize it would still perform a table scan even though it claimed to do an index seek in the execution plan but still show the query cost from the table scan point of view, but now I do!

Thanks!

ScottPletcher · March 22, 2017, 9:31pm

If you (nearly) always lookup on the same column(s), then cluster the table on those columns or a representative id for those columns, not on identity!.

The "identity clustering" myth is the most damaging to good performance. Yes, varchar(32) is rather long, so you'll definitely consider using an id to substitute for it, but where the id has meaning in the other tables. For example:

CREATE TABLE names
(
Name_Id int IDENTITY(1, 1) NOT NULL UNIQUE,
Name varchar(32) NOT NULL PRIMARY KEY
)

CREATE TABLE Test1
(
Name_Id int /fk-->names(Name_Id)/,
Id int IDENTITY(1, 1) NOT NULL,
Number INT,
PRIMARY KEY ( Name_id, Id )
);

SELECT * FROM Test1 WHERE Name_Id IN (SELECT Name_Id FROM Names WHERE Name IN ('Test1', 'Test2', 'Test3')

That should always give you clustered index seeks and forward reads, and you don't have to constantly build nonclustered indexes to make queries perform reasonably well.

Kristen · March 23, 2017, 8:29am

Did you check which index? I've been caught out seeing (at a quick glance) an Index Action in the Query Plan and then, on closer inspection, seeing that it was actually using the Clustered Index and NOT the index I was hoping for - so was, in effect, a Scan

LockeGarmin · April 3, 2017, 12:28pm

Thanks for the tips Scott! I probably should have provided more context that the queries I posted were just an exercise in figuring out the benefit of INCLUDE (column_name(s)).

Thanks for pointing out the clustering myth too. It wasn't until a couple months ago that I believed a clustering index was an index with multiple columns and not how the records are ordered on disk (which had me rather confused why SQL Server only let me create 1 per table).

Do I understand you correctly that you would only use a name table with name_id if it's going to be used in other tables as well, and otherwise just cluster on the varchar name field?

LockeGarmin · April 3, 2017, 12:32pm

Thanks Kristen! In this case each of the tables only had a single index, but they weren't clustering indexes so your experience does help me see why I would see table scans instead of indexes being used.

ScottPletcher · April 3, 2017, 5:25pm

Not necessarily. If the name were subject to change with any amount of frequency, then I would encode the name as a number for sure. Or if the table had a large number of rows, to reduce the size. Plus if there's a reasonable expectation that you could add a related table any time soon, you'd want to encode. But for a relatively small table, you could just keep it simple and use the actual name.

The really big thins is under no conditions would you want to use an identity column when you (almost) always look up on name. EIther use the name itself or and encoded numeric value representing the name.