IS NULL vs =NULL

sqlor · June 15, 2022, 8:27pm

Hello

How can the below return different things?

and ([A]='text1' or [A] is null or [B]='text2' or [B] is null)
vs
and ([A]='text1' or [A]=null or [B]='text2' or [B]=null)

Thanks!

jeffw8713 · June 15, 2022, 8:58pm

Because they are different where clauses...

NULL is an unknown value and therefore a NULL value can never be equal to another value, even if that other value is NULL. NULL = NULL is unknown and cannot be true - eliminating that row from the results.

[A] IS NULL does evaluate to true when [A] is a NULL value.

JeffModen · June 17, 2022, 3:11pm

Please the see following links for the documentation on this highly contentious subject...

I say "highly contentious" because for front-end code, these actually default to off and front-end developers are used to being able to compare to NULL directly. The ANSI spec says otherwise and is a bit "purest", IMHO. By definition, NULL means "Unknown". The contention is, how can you compare to something that is "Unknown" and they made special functions (a long time ago) to do such comparisons to "unknown".

The really bad part is that a lot of people over the decades use NULL the same way they would for "Nothing" instead of "Unknown". And that's were the biggest contention is.

The reminder phrase is that "Null is not Nothing" meaning that since NULL means "Unknown", you don't know if it's actually "Nothing" or not.

This has gotten so crazy that people have actually proposed two different types of NULLs... one for "Unknown" (the true but inconvenient meaning of NULL) and one for "Nothing" (the untrue but extremely convenient meaning that people have adopted".

I have no preference. I've used NULL for both meanings in the past but am observant of all the warnings about ANSI NULLs = ON becoming the only choice in the future. The cool part is that I've always used the ANSI methods and related functions for handling NULL no matter if I was using it as a true "Unknown" or to mean "Nothing" because all the ANSI NULL stuff being enabled has been the default in SQL Server since forever.

An additional fact is that I'm really careful and try to avoid the use of NULL to begin with especially when it comes to columns that are supposed to contain things like "EndDate" for a range of dates (as just one of many examples). If that's unknown or hasn't happened yet, I use the default date of "9999', which is the same as "01 Jan 9999". The reasons why I use that for a default end date are many...

It's wicked easy to type and takes up less coding real-estate than 9999-12-31.
It's wicked easy to remember and standardize on.
Computationally, it allows mathematical "head room" for comparisons like the old < EndDate+1 formulas.
It means that my WHERE clauses don't need WHERE (SomeDateCol <@EndDate+1 OR SomeDateCol IS NULL). Instead, I can just use WHERE SomeDateCol <@EndDate+1, which is not only easier to type but also leads to a bit of a performance improvement in code.
It's a common technique to use "all nines" for a maximum possible value.
It ALWAYS works correctly.
And with the advent of Row and Page compression, it prevents what is supposed to be a "fixed width" column that's frequently updated to a value from becoming a source of fragmentation because of date columns being compressed and becoming a source of "ExpAnsive" updates that cause page splits and the resulting fragmentation.