Guid Sorting

Hi, does anyone know if there is a difference between sql versions, say 2012 to 2017 when sorting Uniqueidentifiers? Strangely, in Sql 2012 a uniqueIdentifier column is sorted and the same column in 2017 is not. Though data is different but their uniqueIdentifier source is the same, it is generated from .net using comb guid which is supposed to generate sequential Id's.

Also restored 2012 backup on 2017 but it was still sorted. So if this was due to Sql 2017 sorting in a different manner, this restored DB would not have sorted it correctly, which makes me think if this is a DB related thing? I am still trying to find an answer for this but has anyone seen something like this before?

Heh... if you're using anything that produces a sequential GUID, you have bigger problems than you think. In the presence of "ExpAnsive Updates", they fragment just as bad as IDENTITY columns except they're bigger.

Please see the following video. It destroys more than 2 decades of myths about Random GUIDs and Index Maintenance in general. In the process, it also beats the tar about the myth of "sequentially keyed" indexes being some sort of holy-grail when it comes to fragmentation. They're horrible when it comes to hot spots and fragmentation in the hot spot due to fragmentation.

And sorry about the loud ads af 15:00, 30:15, 45:20. I had no say so in those. They WILL lift your headset so be ready.

Did you specify an ORDER BY in the query? If not, then rows happening to come out in order is coincidental, not guaranteed.

That is, by relational theory, unless you explicitly put an ORDER BY in the query, rows can be returned in any order. In SQL 2012, it's possible whatever index was using to satisfy the query originally happened to be in guid order. But that in SQL 2017 a different index is being used (or that SQL not reading the index in "ordered" mode).

Again, the cold and hard rule is: Unless you explicitly specify ORDER BY in a query, you should NOT count on getting the rows in any specific order, no matter how many times you may have gotten them that way in the past.

1 Like

Thanks Jeff, I was not aware of that at all, this is indeed an eye-opener.

But it still does not answer my question. Why would the sort order be different on 2012 & 2017, am still trying to find an answer for this.

Thanks Scott and yes I totally agree about the Order By and sorry I missed mentioning this that I do have an Order By in the query.

Here are the queries am using. Executing them on 2012 and 2017 returns data in a sorted order in the former but:not in the latter, so the "matching query" returns all the data only in 2012 :


IF OBJECT_ID('TEMPDB..#TMP_CreatedDate') IS NOT NULL 
DROP TABLE #TMP_CreatedDate

IF OBJECT_ID('TEMPDB..#TMP_ID') IS NOT NULL 
DROP TABLE #TMP_ID

--1. Sort by CreatedDate which is GetDate
SELECT ROW_NUMBER()OVER(ORDER BY CREATEDDATE) AS ROWNUM, ID,CreatedDate
INTO #TMP_CreatedDate
FROM MyTable
ORDER BY CreatedDate;
 
--2. Sort by ID which is UniqueIdentifier
SELECT ROW_NUMBER()OVER(ORDER BY id) AS ROWNUM, ID,CreatedDate
INTO #TMP_ID
FROM MyTable
ORDER BY id;

----OR---ANOTHER WAY-----

--1. Sort by CreatedDate which is GetDate
SELECT IDENTITY(INT,1,1) AS ROWNUM, ID,CreatedDate
INTO #TMP_CreatedDate
FROM MyTable
ORDER BY CreatedDate;

--2. Sort by ID which is UniqueIdentifier
SELECT IDENTITY(INT,1,1) AS ROWNUM, ID,CreatedDate
INTO #TMP_ID
FROM MyTable
ORDER BY id;
-------------------------

IF OBJECT_ID('TEMPDB..#TMP') IS NOT NULL 
DROP TABLE #TMP

--GET MATCHING ROWS i.e. CREATED DATE AND ID's ARE IN THE SAME ORDER
--This returns all the data in Sql 2012 but not in Sql 2017
SELECT T1.ROWNUM AS T1_ROWNUM,T2.ROWNUM AS T2_ROWNUM,T1.ID AS T1_id,T2.ID AS T2_ID,T1.CreatedDate AS T1_CreatedDate,T2.CreatedDate AS T2_CreatedDate
	INTO #TMP
FROM #TMP_CreatedDate T1 INNER JOIN #TMP_ID T2 ON T1.ROWNUM=T2.ROWNUM
WHERE T1.ID = T2.ID

--GET NON-MATCHING ROWS i.e. CREATED DATE AND ID's WERE NOT IN THE SAME ORDER
--This returns data in Sql 2017 only
SELECT T1.ROWNUM AS T1_ROWNUM,T2.ROWNUM AS T2_ROWNUM,T1.ID AS T1_id,T2.ID AS T2_ID,T1.CreatedDate AS T1_CreatedDate,T2.CreatedDate AS T2_CreatedDate
	INTO #TMP
FROM #TMP_CreatedDate T1 INNER JOIN #TMP_ID T2 ON T1.ROWNUM=T2.ROWNUM
WHERE T1.ID <> T2.ID

The order of rows written from a "SELECT ... INTO" query creating a new table is not guaranteed by SQL Server, even if an ORDER BY is specified.

If you specify an ORDER BY when SELECTing from that table, then you would get a guaranteed order.

1 Like

Ok that's a good point, so I will try to do an Insert..Select style query with an order by and see how iit goes.

But it still beats me, even then that would be a massive coincidence to get millions of rows sorted in one version as against another and that too on multiple SQL 2012 serves not just one. I am now also wondering if we have exhausted all possible queries to find if data is populated in an order and nothing changed between SQL versions then it certainly has to be how .net is generating this guid, it might be different in both cases....but then again it's the same code running on both the machines.

I will try the insert..select style and post my results

If data is being inserted into a table with an identity column, then SQL ensures that the identity values are assigned in the same order as the ORDER BY clause on the INSERT, but it still does not guarantee that they will be inserted strictly in order.

1 Like

You can help it a whole lot, though.... Set MAXDOP to "1" while doing the insert WITH (TABLOCK) and the ORDER BY you mention

This still does not guarantee any order on the table - since a table, by definition, is unordered. Now if you are referring to setting a clustered index on that sequential ID - and forcing a non-parallel query to insert into that clustered index, then that only reduces the chances of fragmentation and still doesn't guarantee any order on the table.

Back on 2008 R2 - there was a known 'bug' that many people utilized. That issue was the fact that you could insert data into a table variable using an ORDER BY and a following select would return that data in the same order. This bug was fixed and many queries 'broke' because the results were no longer in the expected order.

The only way to guarantee the order is to use an ORDER BY on the query that returns the data to the client. No other method will guarantee the order of the rows returned.

2 Likes

Interesting. I agree that a HEAP is ultimately un-ordered but I don't agree that a table (difference being is that it's a Clustered Table having a Clustered Index) has no order. I DO agree that there's usually no guarantee that relying on the system to actually follow that designated order during a SELECT unless there is an ORDER BY but tables are actually ordered by pages and, within pages, the order is present in the slot-array..

A table has no inherent order, whether clustered or not. Relational theory dictates that it be so.

At any rate, what would be the point of trying to force rows to be inserted physically in a very specific order? As long as the logical order is correct -- for example, matching an existing clustered index, for efficiency -- then the exact final physical location of the rows is irrelevant.

I have to agree to disagree especially since what we call relational databases don't meet all of Codd's rules. They're just really clever file systems. And it's the logical order that I'm talking about. I agree that the physical order of pages frequently doesn't matter and most people can't use the physical order anyway.

People keep saying that tables are "unordered". If you're talking about pages, I agree but logically, they're ALWAYS ordered. If they're not, it's because the CI is corrupted. And, if they weren't ordered, you'd never have fragmentation. That includes NCI's, as well.

That's not entirely true. As you well know, you can have fragmentation simply because data expands, even in a heap where SQL truly doesn't need to keep any order to the data.

Ironically, you've just made my point. The data MUST be in the correct logical order and, in order to do that, it has to do page splits when data expands. :wink:

No. As stated, even in a heap, SQL often has to move data simply because the 8060 byte limit is reached, having nothing to do with logical order. :smiley:

Totally agreed on heaps but we're not talking about heaps here... we're talking about "tables", which is short for "Clustered Tables", which have a Clustered Index.