SQL Coding Standard - Things On The Right End Of The Line

Kristen · July 31, 2015, 2:01pm

I don't like Things On The Right End Of The Line because they are often off the edge of the screen and out of sight.

Be interested to know, for folk that do that, whether it is "always done it that way" or "deliberately do it that way BECAUSE"

Here's what I do, compare to the Right-End way:

SELECT [MyAliasName1] = MyColumnName1,
       [MyAliasName2] = CASE ... some really long conditional code ... ,

more common Right-End way:

SELECT MyColumnName1 AS [MyAliasName1],
       CASE ... some really long conditional code ... AS [MyAliasName2],

to my way of thinking the Alias Name can be off the right hand edge of the screen and not spotted / taken into consideration

By the by we ALWAYS use "AS" for aliases - I'd love to have a "Fussy"/EXPLICIT flag/setting in SQL that made the parser moan about the absence of things like that. All too easy to miss a comma, for example:

SELECT MyColumn1    -- Forgot to put a comma after this one
       MyColumn2

such that MyColumn2 becomes the alias for MyColumn1. (I think the way to fix that is to Alias all tables and prefix ALL columns with the Alias Name. To my mind that makes the code look very busy, so we only use Table Alias names where there is ambiguity - i..e the presence of them indicates something out of the ordinary. This is helped, for us, because our Column names are unique within the database, not just the table - so basically we only need them when a table is used more than once.

We always use AS if we use an Alias Name for a Table

FROM MyTable1 AS T1
     JOIN MyTable2 AS T2
         ON ...

another area where I commonly see Right-End coding is the JOIN

FROM MyTable1 AS T1 JOIN
     MyTable2 AS T2 ON T2.SomeID = T1.SomeID AND ... length join conditions ... LEFT OUTER JOIN
     MyTable3 AS T3 ON

to my mind that, too, is risky. The fact that T3 is Outer Joined can be off screen and not noticed. It might be an APPLY or a JOIN ...

In fact we don't do Right-End with Parenthesis, Brackets, BEGIN/END etc.

IF @Parameter = 1
BEGIN
    SELECT Col1, Col2, ...
    FROM
    (
        SELECT Col1, Col2, ...
        FROM   MyTable1
        WHERE ...
        UNION ALL
        SELECT Col1, Col2, ...
        FROM   MyTable2
        WHERE  ...
    ) AS X
        JOIN ...
    ORDER BY ...
END
ELSE
BEGIN
    ....
END

in this example the BEGIN, END and ELSE all line up. I don't have the BEGIN at the Right End out of sight. I know that SQL does't care what spacing / alignment I use, but it is common in other languages to have a "Lint" tool which WILL moan about Logic not matching code-alignment which is helpful to find "run on" code.

(We ALWAYS use BEGIN / END, even for a single-statement. Odds are good that someone will stick a DEBUG statement in there, or more/new code, and not spot that the single-statement has now become multi-statement)

The "(" and ")" also line up, in particular I do not do:

IF @Parameter = 1 BEGIN
    FROM (
        SELECT Col1, Col2, ...
        FROM   ...
    ) AS X
END

What I am after is that at-a-glance I can spot things that are wrong, before I ever try to execute the code, let along put it in production - often referred to as "defensive programming"

I'd be interested to hear from folk that Right-End code such things, and why you think that is less error-prone / more defensive

JamesK · July 31, 2015, 2:21pm

It would be too lengthy for me to write all the reasons for my way of formatting (I don't have your talents for detailed replies, Kristen ) but below is an example of my style of formatting.

Some of the main criteria are:
a) The logical pieces of the query (SELECT, FROM, etc.) are formatted so that they can be identified at a glance.
b) The joins are done such that I can see how many tables are joined, and what kind of joins they have
c) I have no hesitation in inserting line breaks, so my lines hardly ever get to be beyond 80 or 100 characters.
d) If I have enough time, or if I am writing something that I need to be extra careful about, I will try to line-up the multiple join conditions and their equal signs and so on (to make it easier to read).

SELECT
	a.Col1,
	b.Col2,
	MAX(b.Col3) AS Highest,
	SUM(c.Col3) AS TotalAmount
FROM
	TableA a
	INNER JOIN TableB b ON
		a.Col1		= b.ColB
		AND a.ColC	= b.ColD
	LEFT JOIN TableC c ON
		c.ColA		= b.Col1
WHERE
	a.ColD		= 17
	AND b.Co2	> 7
GROUP BY
	a.Col1,
	b.Col2
HAVING
	MAX(b.Col3) > 10
ORDER BY
	b.Col2,
	c.Col3;

Kristen · July 31, 2015, 2:30pm

Question: If you would do that some of the time why not all the time? (So that it is consistent)

It may be a yester-year thing, and thus more appropriate to an old-codger like me, but when I learnt my trade consistent-layout was regarded as a critical element of coding style, as familiarity with that consistent style then meant that something non-conforming stuck out like a sore thumb

The code that I see that is "just written", and not "consistently laid out", is without exception the most error-prone coder that I encounter. No care for whether the capitalisation of a table/column name is correct and, perhaps more worryingly, not consistent from line to line where that table / column name is reused with different capitalisation.

Perhaps the world has moved on to a place where it doesn't matter? or maybe the older ways were actually more defensive (and thus, to my mind, more effective)?

JamesK · July 31, 2015, 2:33pm

That would be ideal, but I think of the lining up of the "=" symbols as icing on the cake. For short queries and such, or for one-time/throw away queries, I feel like it doesn't add much value.

Kristen · July 31, 2015, 2:43pm

I do it for throw-aways too (which, I agree on the fact of it, is unnecessary) because it is second nature to me now ... I get bugs in Throw Away code too, and maybe by being consistent & defensive on those too I avoid some/many?? as-a-result bugs?

... and then if the throw-away becomes Production Code (which is about 101% of the time I would say, at a rough count ) its already Oven-Ready for when I put my Coding Standards Review Tsar's hat on

ScottPletcher · July 31, 2015, 4:48pm

This is tricky. Some of the issues with lining up "=", etc., are that:

more eye movement, as the eye has to scan further
it doesn't stay that way through maintenance, in particular, one long column name becomes a nightmare of realigning everything ... what a royal pita!, viz.:

SELECT
a.Col1,
b.Col2,
MAX(b.Col3) AS Highest,
SUM(c.Col3) AS TotalAmount
FROM
TableA a
INNER JOIN TableB b ON
a.Col1 = b.ColB
AND a.ColC = b.ColD
AND a.Total_Invoiced_Amount >= 0.00
LEFT JOIN TableC c ON
c.ColA = b.Col1

JamesK · July 31, 2015, 4:53pm

This so very true.

I don't do formatting manually, I use the Red Gate SQL Prompt to do the formatting. But SQL Prompt does not do the equal sign alignment. I do that when I am reading carefully through the code. Having to do all that alignment sort of slows me down and makes me pay attention to the code. That is part of the reason I do that. (Yeah, yeah, I know, I am weird )

Kristen · July 31, 2015, 5:18pm

Here's how I see it:

I am interest in b.ColB and b.LongColumD [name changed to deliberately exaggerate alignment] to check that all appropriate columns in TableB have been included, therefore I put the Target Table's (TableB in this case) columns on the left as they are then naturally aligned [I deliberately add a space in front of "ON")

SELECT
	a.Col1,
	b.Col2,
	MAX(b.Col3) AS Highest,
	SUM(c.Col3) AS TotalAmount
FROM
	TableA AS a
	INNER JOIN TableB AS b
		 ON b.ColB = a.Col1
		AND b.LongColumD = a.ColC
                AND a.Total_Invoiced_Amount >= 0.00   --- I would put this in the WHERE - not directly relevant to this INNER JOIN (but it might be for an OUTER JOIN)
	LEFT JOIN TableC AS c 
		 ON c.ColA = b.Col1

My take is that I am interested in the comparison of B.ColB = A.Col1 and also of b.LongColumD = a.ColC ... I'm not interested (vertically) in A.Col1 and a.ColC

I would not, normally, align the right part of the WHERE either for same reason that I am reading left-to-right so instead of JamesK's example

WHERE
	a.ColD		= 17
	AND b.Co2	> 7

I would do

WHERE       a.ColD = 17
	AND b.LongCol2 > 7
...
WHERE
	    a.ColD = 17
	AND b.LongCol2 > 7

(again, spaces added so that WHERE line (with or without a line break) and first AND (or OR) line up

However, there are times when elements are related where I may align them to make them easier to compare (vertically)

WHERE
	    a.LowerLimit         >= @MyStartValue
	AND b.VeryLongUpperLimit <  @MyStartValue
...
WHERE
	    a.LowerLimit         >= '20150101'
	AND b.VeryLongUpperLimit <  '20151231'

I only consider this important / use this style when Const Values are used or ParameterNames are different but need comparing - to make it easier to compare them (vertically) to check that they are a "matched pair"

In particular I always have the AND / OR on the left, so I never do this:

WHERE       a.ColD = 17 AND
	    b.LongCol2 > 7

because, for the reasons I mentioned above, I feel that it can get "lost" at the right side of the screen on a long line. Consider this for example:

WHERE       a.ColD = 17 AND
	    b.VeryVeryVeryVeryVeryLongCoOrCalculation > 7 OR
	    b.Col2 < 5

I'm pretty sure that parenthesis would be desirable in this case but I'm likely to miss spotting it compared to left-sided formatting:

WHERE       a.ColD = 17
	AND b.VeryVeryVeryVeryVeryLongCoOrCalculation > 7
	 OR b.Col2 < 5

ScottPletcher · July 31, 2015, 5:31pm

See, here's again where it gets tricky.

However, there are times when elements are related where I may align them to make them easier to compare (vertically)

I thought this about consistency. How can you have "consistency" when each person is deciding when something should be aligned and when it shouldn't? Particularly as code gets maintained by different people?

I don't have all the answers on this, except to say don't get too carried and specific with aligning/formatting, or you can spend far more time on just that then it's ever worth.

 AND a.Total_Invoiced_Amount >= 0.00   --- I would put this in the WHERE - not directly relevant to this INNER JOIN (but it might be for an OUTER JOIN)

WHEREs are evaluated after JOINs. It can be vastly more efficiently to filter rows in the WHERE clause. For example, imagine that the table contains 10s of Ms of rows but that only the current day's rows, say 10K, match this condition.

Kristen · July 31, 2015, 5:50pm

(I presume you mean in the JOIN)

Very interesting, never considered / realised that, thanks.

I'll have a look at some Huge Data queries that we have to see if this might make an impact.

Kristen · July 31, 2015, 5:52pm

We have guidelines for such things, but you are right that its imprecise and Rule One should be "Consistent".

Code Review picks such things up, and is used as a Training Aid, but ... its still subjective.

Apart from Rule One Consistency my other main aim is Defensive Programming - so something that makes the code clear and/or prevents a mistake during Code Maintenance, is also worthwhile IMO.

JamesK · July 31, 2015, 6:17pm

In my case, currently I am the one and only guy who normally works with and develops database code.

In my previous job, we had more than one person, and we tried to get consistency using a code formatting tool. Even so, I could take one look at the code and tell who wrote it because of formatting differences. But a new person would have found it to be consistent, for the most part. Code formatting tools allow you to customize the formatting, so we had to agree to set up our code formatting tools the same way for all developers.

ScottPletcher · July 31, 2015, 6:25pm

Yeah, with the prevalence of free T-SQL formatters on the web, I don't consider any reasonable formatting to be too big a problem anyway. Maybe it's just me. I'm far more interested in performance rules/guidelines, such as never use ISNULL() in a WHERE or JOIN, than I am in the actual formatting. Formatting can be easily and automatically converted, properly coding for performance cannot.

Kristen · July 31, 2015, 6:29pm

I don't have anyone above (or alongside) me to peer review much of my code.

We store our code in a Version Control repository (SVN in our case) and when I check my code in I use a DIFF tool on all changes so I can compare and review my own code.

I find that re-visiting my code after a day or so, when it is checked in, allows me to spot things not noticed when the code was written. Its not often that I spot something, but it does give me a bit of a safety net / improve consistency, in the situation where no one else peer reviews my code.

Just a though ... dunno if it would work for other one-man-band shops?

JamesK · July 31, 2015, 6:43pm

Funny you should say that! I ALWAYS do that, especially when the output of the query is something critical (i.e., something that can affect dollars and cents the company makes). I do that review not so much for code formatting or style, but for the logic and accuracy of the computations. But as I am reviewing the logic, I find myself adjusting the formatting slightly.

JimSnyder · July 31, 2015, 7:49pm

I consider readability and maintainability to be next to godliness. The coders I am following do everything in a run-on block, and I have to parse their code (days lost?) just to figure out what they did. Like Kristen, I always explicitly declare and define everything, including, but not limited to aliases.

I am less concerned about the BEGIN END block delimiters unless using cursors of ''if' logic. Despite that, indentation is crucial to my readability of code, and braces, brackets and parentheses are always matched on the same indentation level. The exception is stand-alone functions, like ISNULL(field.name) always being on the same line. I waffle a bit when it gets to two functions, like RTRIM(ISNULL(field.name), ...), but if it is pushing the right edge, I nest them with matching parentheses.

I have the disadvantage of working remotely on a site with few permissions, so when I finish a stored procedure/function, I have to email the SQL code to someone who can build the code and run it. Because of this, I at least sleep on the code and look at it the next morning. I nearly always see something I am already starting to forget that I had not commented, or something I forgot to uncomment before sending.

I like the SQL commands to be separate from the logic. As in Kristen's example,

    WHERE       a.ColD = 17
	AND b.LongCol2 > 7
    ...
    WHERE
	    a.ColD = 17
	AND b.LongCol2 > 7

Kristen · August 1, 2015, 9:19am

Nice to hear I'm not the only one!!

Anyone wedded to Right End Of Line for BEGIN, AND and JOIN etc.? Would be good to hear your views ... I see that style in a lot of code so it must be popular.

I had a colleague who used to use that style and when I asked him why he said it was because his C coding style was

IF (A == B) {
    ... some code ...
}

and he saw things like BEGIN / END in the same vein, so stuck the BEGIN at the end of the line. My C coding style (back in the days ... !!) was always

IF (A == B)
{
    ... some code ...
}

and chatting to folk at that time (70's & 80's) about why they formatting the Open-Brace "{" at the "Right-hand End" the consensus was "its fewer lines of code". Back then we used to work a lot from printouts and a bit earlier than that on Teletype, and length-of-code was an issue. Once VDUs, then Monitors, became common, and size-of-code grew exponentially, and probably the cost of labour too, the need and understanding of Defensive Programming grew and exponents of using formatting / code styling to combat bugs sprung up.

I read (90's IIRC) plenty of books that talked about ways of decreasing chance-of-bugs. For me it started with "Code Complete: A Practical Handbook of Software Construction" (Published 1993, revised edition 2004) to which Microsoft Press added a number of similar titles in the years that followed such as "Software Project Survival Guide (Developer Best Practices)" and "Writing Solid Code: Microsoft Techniques for Developing Bug-free C. Programs".I also particularly enjoyed "No Bugs!: Writing Error-free Code in C and C++", but it is very specific to the sorts of things that go/went wrong when writing C-Code
http://www.amazon.co.uk/Code-Complete-Practical-Handbook-Construction/dp/1556154844/

By the by, we don't use ISNULL() - apart from the fact that we find the name counter-intuitive we need to use COALESCE() on the rare occasions that we have more than two parameters (so standardise on using it all the time thus easy to "add one" if necessary) but, more importantly, ISNULL has slightly weird behaviour when the parameters are subtlety different types and (from memory) takes its cue from the 2nd parameter rather than the first (which is how COALESCE() works and, I think, more intuitive / as-expected).

JimSnyder · August 3, 2015, 2:56pm

You are not alone.

I am 60, and started programming in 1995 in 'C' on UNIX. Like you, my style quickly became the matching braces due to getting lost in my own code. I have never looked back.

The open brace at the right was a publishing gimmick to save paper. procedure() { took only one line instead of the 'wasted' '{' by itself making two, and publishers considered space = profit. Programmers took books literally and a bad style was born.

It has been since the turn of the century since I used T-SQL, and it was pre-SQL92 standard. At the time, ISNULL was what my coworkers used and complained bitterly about. I just haven't made COALESCE() part of my toolset yet. I have been getting back up to speed on basics like temp tables, CASE statements, and optimizing performance and have a ways to go yet. I have picked up inline functions, table queries, pivots, and WHERE subqueries so far. I have written a dozen stored procedures and maybe five functions so far.

I am happy to help, and expect to be expanded on / corrected by those more experienced (or fresher). I am also happy to contribute opinions (clearly labeled as such).

Kristen · August 3, 2015, 4:32pm

Maybe its just us old codgers then .... churning out quality code with imperceptible numbers of bugs. We are available for consultancy hire of course ...

We embraced them when they first came up - perhaps all too readily. We had critical performance problems with one at the heart of our code, because it recompiled (and locked whilst doing so) in a highly unacceptable way (and for no good reason, although I except that the Optimizer wasn't, at the time and possibly still today, able to deduce out).

Similarly with the use of @TempTables instead of #TempOnes - we've had dreadful performance with them and, in both cases, the advance in syntax was ahead of any detailed means of debugging the performance of them - or maybe it was just the lack of really in-depth familiarity in the community to be able to figure out what was going wrong.

Technology does seem to be like that, to me. Two steps forward, and one back - usually in an unexpected direction.

JimSnyder · August 4, 2015, 1:17pm

From common sense, reading carefully, and experimenting, @TempTables run faster on small amounts of data or data that doesn't need an index. Since the stored procedures I am building are hitting tables with hundreds of columns and nearly a million rows, I haven't repeated my first attempt at fixing someone else's code that used table variables (@TempTables). I replaced their tables with #TempTables and dropped the run time from around half an hour to minutes.

Although I have used inline functions, it was at a place where an inline function replaced several temp tables and I saw no change in efficiency. Again, careful reading led me to apply it only where it was better. The chuckle from this is that the developers' code I am fixing comes from a pair of developers that have and use profiling (I can't due to being remote) and access to the DML. They work together at times and finish things quicker than I can figuring out the DB on my own.

At first, I kicked myself for being so slow, but then realized two other coworkers had tried before me and had thrown their hands up and quit. Being old also made me persistent, knowing I would find what I needed and could solve the puzzle.

Technology often takes one step forward and several steps backward. A vendor supplying a bill paying app released a new version with a glossier interface and requiring only one account instead of an admin and a users account. Lost in the process was the ability to search by patient name, the security did not work, and other functions just disappeared. Since I support the app, I can tell you that 'password resets' went from about one a month to a half dozen a day, and complaints from users went through the roof. I would shrug it off as just one app, but common apps like SnagIt, WinZip, etc suffer the same problems.

What I think I am seeing is this strange morphing of employers from hiring skilled workers and training them on new tools to hiring cheap employees who 'have' the exact set of tool skills that they want. I hate to blame it on the economy as I think that is a crutch, but if anyone is seeing that any differently, I would like to hear how they perceive it.