Should i used date limit on a clustered index table?Also 2c cindex?

sapator · March 30, 2017, 7:39pm

Hi , yes it's 90000, 2, 90000 (give or take 90.000).
So do i need to change the index?That's the confusing point.

Should the index with the 2 values specified first or second?

As, (and correct me if I am wrong) Kristen and Scott says to leave it as is and Bustaz says to change it.

Thanks.

BustazKool · April 3, 2017, 3:08pm

Okay, I am officially confused.There is more than one column. Queries are always using both columns. The first column has almost no selectivity. You are still advocating to leave the least selective column first.
I'm willing to be educated but first I have to be un-confused. I'm sure that I'm missing something, perhaps something obvious, but I am at a loss.

ScottPletcher · April 3, 2017, 5:19pm

Sorry, you're quite right. The issue is that there are two conflicting considerations.

(1) Higher initial selectivity on the index. This is quite valid, but it's a general consideration, a "default" rule if you will.
(2) Avoid fragmentation of the clus index. This is valid too, and in this case I believe it overrides (1), since both columns will (almost) always be specified anyway and because c2 appears to be relatively random.

Index tuning often involves a number of judgment calls. I'd accept another DBA preferring c2 first to c1. Honestly you'll probably be OK with either. As long as you do not fall back on an identity column instead, that would be a big mistake here.

sapator · April 4, 2017, 7:07am

Hi.
Can you explain the identity column a little?
I don't quite understand what and how could that be a problem.
Would it be a problem if i use an identity column first and then the 2 columns and put all 3 in an index with the order or:
Identity - C1 - C2 ? would that cause issues or fragmentation or wrong execution plan calls?
This is theoretically since I do not have an Identity column but is good to know.
Thanks.

P.S. also , if an index has random values, is it better to select (on a 2 columns index) the value that does not have random values, even if the values are a lot less (like C1)?

Kristen · April 4, 2017, 7:55am

If I have understood correctly:

You are not selecting based on ID at all, all your selects on the table are on C1, C2 ... so therefore do not include ID in your Clustered Index.

sapator · April 4, 2017, 8:24am

Hi.
Yes you are correct.
I was asking theoretically on how Scott suggested that an identity could be a problem.So I didn't quite understand where a problem may arise.

I don't have an identity column here but it's good to know the conditions it may be helpful or a problem.

Thanks.

ScottPletcher · April 4, 2017, 2:17pm

The identity column would be last in the index keys simply to prevent duplicates and thus allow the index to be defined as UNIQUE. SQL very strongly prefers UNIQUE indexes, and it can prevent ghost records from being left in the table.

So ( c1, c2, id ), if ( c1, c2 ) by themselves are often not unique. If those two are unique 99% of the time by themselves, then you can leave off the identity. If only 50% of the time, then add the id.

sapator · April 4, 2017, 2:38pm

Hi.
There would be about 95% unique. Currently they are 99,99% unique but we are using only the C1 "A" data.
This is central bank data and what non unique records we are getting, depends on the bank making a mistake (that C1 bank "B" does a few).
So C1 "A" is 99,99% correct C1 "B" is about 95% correct.

The problem was that C1 "A" and C1 "B" banks could have the same records so we where force to distinguish based on Bank and Record ID .

The C2 data is coming from the banks and is quite straightforward ( i guess they are using some kind of identity in their system db's but it could not be straightforward as mistakes happens.Probably multiple data bases mixed in a main one ) .

So thanks a lot all for all the info

ScottPletcher · April 4, 2017, 3:03pm

OK, I think I understand better now. Yeah, I would stick with just ( C1, C2 ). Identity would just cloud up the natural keys in this case. And congrats again for using those as the key in the first place, far too many people just "default" to an identity column and just won't let go of that no matter how much it hurts their overall performance!

sapator · April 5, 2017, 7:59am

K.
Thanks for all the info.
Very useful!