UniqueIdentifier Column in Every Table

We have some developers who are not well-versed in SQL Server nor in table design. I find that most tables contain a column with the uniqueidentifier data type. This annoys me because I know there are other ways to make every row unique (in most cases). I think it is inefficient but I'm open to other opinions before I talk to them.
Any help? Thanks

An identity column can also do that.

2 Likes

In broad terms they would be better off with IDENTITY than UNIQUEIDENTIFIER (GUID).

A GUID is large, so fewer entries per index page, so more index pages, slower queries.

GUID is random, so inserts cause page splits. You can defined a default of Sequential GUID for the table, which ensures ascending values, which is much better. However, that can only be allocated using DEFAULT and we have circumsances where we want to use NewID() to allocate a value BEFORE the row is created (and I don't know of a way to do that with Sequential GUIDs)

If you need to merge data from multiple companies / servers / databases or salesmen out on the road with laptops generating record IDs whilst not connected to the main server then GUIDs are helpful, otherwise I would avoid them.

The use of Natural Keys (e.g. country codes "US", "GB") instead of IDENTITY / GUID numbers is a separate debate. Pros and Cons and Differenet Opinions on that one. We use IDENTITY on ALL tables (and Unqiue Index), even if they have other more suitable candidates for Primary Key and/or Clustered Index because we find referencing rows by a single one-part, skinny, key solves all sorts of problems.

Put a GUID (including hyphens) in an Email (e.g. as a web URL link) and the email may be word-wrapped such that the user clicking on it will only get half the value presented to the web page, and then Clicks Off when they get an error page - lost customer / order / sale!!

Multi-part keys are similarly more complex to program for in many situations, and we find having an IDENTITY available simplifies the code in such circumstances. A GUID would do for that purpose, just we don't see the point (UNLESS we have to merge with other Databases / Servers etc.)

3 Likes

Thank you Kristen and ahmeds08.

Yes, I typically used an incrementing identity column when I create a table. The uniqueidentifier takes much more storage space and is inefficient as you know.

I usually make create a Primary Key on the RecID (identity) column. Is that
your approach too?
Thanks.

As a full-time DBA / data designer for almost 30 years:
The identity "crutch" is the single biggest factor in terrible table and index design. While there are legitimate uses for identity, not all tables need an identity or should have one, period! In particular, and for example, intersection tables should almost always be clustered on the parent key columns, and log tables should almost always be clustered on datetime rather than identity.

Also, the identity column should never be the "default" clustered index ... because there should be no such thing as a "default" clustered index. The clustered index is, by far, the single most important performance factor for any table. Therefore, it should always be very carefully chosen, not based on just slapping an identity column on the table ... grrrr.

I have to spend hundreds of hours every year undoing this garbage as I tune databases. It gets frustrating year after year.

2 Likes

I didn't see anything in previous posts about using the IDENTITY or GUID as the PKey or Clustered Index candidate ... maybe I missed something? or maybe that was the O/P's question, in which case I didn't realise that

I'll give you the traditional consultant answer - it depends. Others have presented excellent reasons not to and to use a GUID. Never usue one as a clustered index - unless that's the method by which the data is most frequently accessed. To reflexively add any non-business key (identity, sequence, GUID, sequential GUID) to data should ALWAYS be examined carefully in light of the access patterns. Key width plays an extemely import in a clustered key. You may want to consider a GIUD for uniqueness and an identity for a clustered key. Clustered keys don't have to be unique. I've created tables with a PK that's unique but not clustered and a clustered key thats not unique. Performance was fantastic and page splitting was mitigated by a fill factor.

I agree, Scott. I like to create the clustered IX on the column(s) that make each row unique. I've seen so many developers add a RecID or uniqueidentifier
to make each row unique. Well, it does that but you can still have duplicate business data with that strategy.
I do like an identity column for RecNbr because it makes it so easy to find a specific record you are looking for. "Hey Sara, take a look at record 9087 in TableB."

I'm not a fan of using GUIDs everywhere. Practically, you'll get lots of situations where you want to select from a specific record or update a particular record. So much easier with an Identity Column that's a few digits.

In terms of choosing a PK that's several columns rather than a single-column Identity, a few concerns:

  • Often the business changes over time. You may start out with requirements that suggest the uniqueness on the table is C1, C2. Then later something changes in the business to make it C1, C2, C3. If you had chosen a PK of C1, C2, now you have a mess (potentially a big mess) on your hands. But if you had chosen an Identity column as the PK, changing your unique constraint to C1, C2, C3 is very easily done.

  • It is easier to code when you use a single column as the PK (of course you can still create uniqueness with a constraint). It's not a huge saving, but you just need to write one column rather than several, when coding joins.

Not that this is in anyway germane to the conversation but....
Clustered keys actually are unique. Under the covers, SQL adds a uniqueifier (is that a word?) to the key, if it needs to, in order to make it so. IIRC, its an eight byte value.

This is a true statement. Sometimes I don't state it all the way though to the physical implementation. If you can have a clustered key that's unique based on the business data - then I'd use that. Otherwise SQL Server will uniqueify the value. I couldn't help but build on to your word :wink: