MIN aggregate function to a BIT field

That would raise the same kind of non-SARGable hell as having NULL EndDates in a table, IMHO. I suppose there could be a reason to have "unknown" conditions for a bit but I can't think of one requiring such tri-state logic in a properly designed database.

Shifting gears to the public to ask the general question... If anyone has a good idea of when tri-state bits (nullable bits) would be appropriate (as in good design). I'd sure like to hear what it is.

Simple example:

You have a database for an obstetrician. You have a patient table has a bit for sex of the baby: 0 = Male, 1 = Female, null = unknown (too early in the pregnancy to tell)

Note that this is extremely common in type 2 SCDs

A NULL EndDate wouldn't necessarily make the WHERE condition nonsargable, unless one made the egregious error of using "ISNULL()" in the WHERE, which should never be done anyway.

I can't post an in-line tally table, so I had to use a physical one, but it still demonstrates the "seekability" of "OR IS NULL":

INSERT INTO #t1 ( start_date, end_date, description )
SELECT DATEADD(SECOND, -tally * 5, GETDATE()) AS start_date,
    CASE WHEN tally % 10 = 0 THEN NULL ELSE DATEADD(SECOND, -tally * 5 - 5, GETDATE()) END AS end_date,
    'Tally value was ' + CAST(tally AS varchar(10)) AS description
FROM dbo.tally
WHERE tally BETWEEN 1 AND 1000000

SELECT COUNT(*) FROM #t1

SET STATISTICS IO ON

SELECT COUNT(*)
FROM #t1

--as expected, this is a nice index seek
SELECT COUNT(*)
FROM #t1
WHERE start_date >= DATEADD(MINUTE, -10, GETDATE()) AND
    (end_date <= DATEADD(MINUTE, -9, GETDATE()))

--this is also an index seek
SELECT COUNT(*)
FROM #t1
WHERE start_date >= DATEADD(MINUTE, -10, GETDATE()) AND
    (end_date <= DATEADD(MINUTE, -9, GETDATE()) OR end_date IS NULL)

SET STATISTICS IO OFF

DROP TABLE #t1

I say it should still not be nullable and it certainly should not be the bit data type. There is such a thing as a hermaphrodite and the default should be something other than a NULL for "Not yet tested".

You're not wrong there. That's precisely one of the uses of NULLs that I try to avoid because of the NULL logic required when you want to know if something will either expire in the future or has no expiration date. Yes, I do agree that the required NULL logic isn't difficult but it's still extra logic.

No problem. What are you using for an index on #T1?

Granted, however it is pretty easy to cook up other simple examples. Schrodinger's cat. e.g. it's condition is unknown (null) until we open the box. After that it's definitely alive or dead. (not aware of other possibilities!)

Sorry for not showing that. It's ( start_date, end_date )

I was going to open up the whole NULL column conundrum but decided that it was a bit (no pun) off topic from the OP. BUT... since the topic has been broached...and without being fanatically religious about it....

No column should be nullable! At the very least, you should have a VERY good reason for allowing NULLs. In almost every situation, you can find an acceptable default value to use in lieu of null (EndDate = Positive infinity, for example). It eliminates those nasty "OR MyColumn is Null" coding constructs that impede performance.

Can you find an esoteric condition where nullability is the best approach? Sure, granted. But most of the time an attribute is nullable because no one took the time to ask if there was a better way to go. (IMHO)

One reason: If you have a lot of columns where the contents are optional or unknown, making them nullable also lets you use SPARSE and save some space.

Personally, I don't like a default value for a column where the values are truly unknown and may never be.

Wouldn't it be better to include a properly designed sister table to include the eventuality of non-NULL values? Even if you do have a good reason to store NULL values, if the column is of a fixed length data-type, it's going to occupy the space on the row of that data-type whether you have a NULL or not. Forgetting about backups and the space it may occupy in cache, it also means that the number of rows per page are reduced and that has an effect on the performance of the queries that may brush against the clustered index.

Yes, I realize that it's convenient to not have to do additional joins but that ma justify the use of properly built views to present the data as a whole.

even if SPARSE? I thought that was the reason for SPARSE??

Absolutely correct. But, like all such tricks, it comes at a cost. Here's a quote from BOL.

Sparse columns are ordinary columns that have an optimized storage for null values. Sparse columns reduce the space requirements for null values at the cost of more overhead to retrieve nonnull values. Consider using sparse columns when the space saved is at least 20 percent to 40 percent.

Admittedly, I've not done a test on how much more overhead that would entail or whether it would be cheaper than a sister table, etc. I do try to design tables where a column won't suffer from NULLs never mind getting to the point of where I'd have to use something like SPARSE.

I'm also a bit concerned about such code. How long will it actually be available? Heh... don't laugh at that fear. Everyone at my old shop was raving about VARDECIMAL when it first came out. What did that last before they deprecated it?

I believe compressed rows also do not store fixed-length columns if they are NULL, if at all possible.

It's easy enough to envision valid uses for NULL values. Tournament pairings/results, for example. For NCAA basketball, the locations and datetimes of the later rounds are known at the start -- and presumably/potentially in a table -- but the teams that will be playing are literally unknown for later rounds until earlier rounds are completed.

Wouldn't it be more properly normalized to have a separate table that matches the teams to the rounds instead of having the teams in the same table as the rounds?

I think that's what Scott meant. A table with rounds and foreign keys to the teams. However, the pairings are unknown until later.

Yes, it would be a separate table, but if you don't pre-populate all entries, then you have to combine results from more than one table when showing both completed and uncompleted rounds. When I've done this for bridge matches, I admit I've just de-normalized it all into one table with the results NULL until known later, when you just UPDATE the results rather than having to INSERT later.

I don't mind designing tables where a column exploits the advantages of NULL values, though. VARDECIMAL? Yeah, that was a weird one. At the moment though, the best place to look for deprecated features is Azure SQL database since it shares the same code base as on-prem SQL Server and is quite a bit ahead in the dev cycle. Basically, anything that will be deprecated will likely happen in Azure first. Will SPARSE go away? Who knows? It's certainly not something I'm going to worry about though.

Yeah, I can't see not using features just because they might go away. Sure, avoid things MS has already explicitly deprecated if at all possible, but I wouldn't want to forego the best solution at that time just because it could possibly be deprecated.