SQL Coding Standards - Join Syntax

Kristen · July 31, 2015, 2:57pm

Leaving aside the old-style outer join syntax of

FROM Table1 AS T1, Table2 AS T2
WHERE T1.Col1 *= T2.Col2

which has mostly died-a-death ...

I always code the Target Table ()T2 in this example) columns on the LEFT as follows:

FROM Table1 AS T1
    JOIN Table2 AS T2
         ON T2.Col1 = T1.Col1
        AND T2.Col2 = T1.Col2

the reason being that I am trying to define what data I want from Table2 and, as such, I want to make sure that all the criteria for Table2 have been supplied (i.e. I don't want any ambiguously matched rows nor cartesian joins)

Usually the columns involved will be the Primary Key columns from Table2, and with this layout I can easily see them aligned vertically on the left

New-hires who start work here think it weird, and not what they are used to, but after a short while have always commented that they find it more reliable ("defensive").

My Joins are laid out as shown above - indented and with each condition on a separate line. (We strive to have single-column primary keys, which helps to make JOINs more straightforward, but its not always possible). Experience has shown that this reduces the chance of associating incorrect columns in the JOIN and reducing the chance of errors during code maintenance - although I'd be the first to admit that "Consistent" is the most important factor in reducing errors during code maintenance.

Of course there are always exceptions! and "what about when"s

I prefer

FROM Table1 AS T1
    JOIN Table2 AS T2
         ON T2.Col1 = T1.Col1
        AND T2.Col2 = 'xxx'

rather than

FROM Table1 AS T1
    JOIN Table2 AS T2
         ON T2.Col1 = T1.Col1
WHERE T2.Col2 = 'xxx'

because I can easily (even if only for debugging) change JOIN to LEFT OUTER JOIN (the WHERE T2.Col2 = 'xxx' would, otherwise, force an inner join)

One particular exception that I don't have a good answer for is if the main Table1 contains an attribute which governs whether that row joins to Table2 or Table3. I would code that as follows:

FROM Table1 AS T1
    LEFT OUTER JOIN Table2 AS T2
         ON T2.Col1 = T1.Col1
        AND T1.AttributeCol = 'T2'
    LEFT OUTER JOIN Table3 AS T3
         ON T3.Col1 = T1.Col1
        AND T1.AttributeCol = 'T3'

but it is not pretty

Another wrong-way-round scenario (which similarly stinks a bit!) would be

FROM Table1 AS T1
    JOIN Table2 AS T2
         ON T2.Col1 = T1.Col1
        AND T1.Col2 LIKE '%' + T2.Col2 + '%'

can't say I use that often, thankfully but it certainly does happen.

I also tend to use an Outer Join instead of a NOT EXISTS

INSERT INTO Table2
(
    Col1, Col2, ...
)
SELECT Col1, Col2, ...
FROM Table1 AS T1
    LEFT OUTER JOIN Table2 AS T2
         ON T2.Col1 = T1.Col1
WHERE T2.Col1 IS NULL

this is partly because I can duplicate the code for an UPDATE (for an existing row) in blocks of code that need to do UpSert type operations. These might now more reasonably be replaced with MERGE statements though.

ScottPletcher · July 31, 2015, 4:41pm

We strive to have single-column primary keys

Data requirements alone should control primary key selection.

Kristen · July 31, 2015, 5:30pm

I agree, but I don't know what you'll make of the distinction I have (in some cases) between Primary Key and Unique Clustered Index.

I like to have a Unique Single-part key in a table. Even if it is unused in the APP we find it useful for aligning data with "other systems" and for "sorting out SNAFUs". Its almost always an IDENTITY, and it is intended that it doesn't change - ceraintly it is not available to the User for change.

We use it in DBA Magic Scripts for sorting things out, and so that the DBA Magic Script can easily find the ideal Single Part Unique Key we usually make it the Primary Key. In many cases the table just needs an IDENTITY/ ID type column (for example the typical Employee table example) so Primary Key and Unique Clustered Index are both the same.

Where this is not the case (as you said: "[where] Data requirements alone should control primary key selection" e.g. a child table) we add an IDENTITY and make that the Primary Key (Unique, Non-Clustered index) and create the Unique Clustered Index as Parent-ID, ChildItemOrID (or whatever the data indicates that should be).

So for us the Data defines the Unique Clustered Index, and not the Primary Key. Maybe this is bad though??

The reason for this is laziness: our DBA Magic Script can just look for the Primary Key rather than having to try to find a suitable Unique Non-Clustered Index (we want to use the non-changeable INT one, rather than User's Unique Sorting And Quick Lookup Code one ... which almost certain can & does change)

ScottPletcher · July 31, 2015, 5:33pm

You're absolutely right on the clustering key, that is most critical. But "always" adding an identity as a PK is indeed ultimately just laziness. It should always be evaluated in the context of that table, particularly with high insert rates. You're serializing INSERTs to the table and forcing pre-processing in some cases to assign the identity values. You should know that it's worth all that before you do it.

Kristen · July 31, 2015, 5:58pm

Is your concern the inclusion of a (not technically necessary to the data) IDENTITY (with a Unique Non-Clustered index) - i.e. because of serializing INSERTs to the table - or the fact that I'm using it as the PKey [when I am actually using a correct / better candidate for the Unique Clustered Index]?

Our OLTP stuff probably doesn't have high enough insert rates to trouble the seriallization issue, but its a good point and one I had not thought of.

Our ability to sort out a data crisis, quickly, because we can deploy a DBA Magic Script has a value to us. I think its is a huge value but if we had insert contention issue the Business Managers would disagree (until they had a data crisis that took us some time to sort out - then they'd behave like Politicians saying that they want Left to be the new Right )