Where clause of an update statement

BBarn · July 13, 2015, 8:25pm

I agree on the data type, however I am taking the data from a third party. Not only is the data type wrong for the time, they don't even supply a year for the dates, it is assumed to be this year.

Getting the third party provider to mod their schema is not likely, therefore I have to combine what they give me and manage my own schema.

Why is SQL trying to combine a null date or a time that is not specifically listed in the where clause? It doesn't matter if it is in the compiler, pre-compiler, post-compiler or a direct engine call, it should never be looking at records beyond the where clause.

I don't understand why one would have to double-up on a piece of logic in two areas to have success. The truly bigger concern is that this happens on select statements as well, not just updates. I don't want to conflate the issue with other more detailed queries and views, I just need to understand what is happening here so that I can parlay that into those more complex examples.

EDIT:
Why does it only happen on large datasets? When I break it down in 30 day increments it works fine. I can't run queries based on the predefined record count being known prior to execution, that would be insane.

EDIT2:
It can't be looking at the dataypes and assuming an issue, it works with smaller datasets.

gbritton · July 13, 2015, 8:30pm

It does matter where it happens. At compile time it doesn't have the data so has to consider the datatypes involved.

it is not looking at any rows beyond the where clause.

With large datasets, you are more likely to see parallelism in the plan ( I think I asked about that before), which means more things happen asynchronously, which is good.

yes, it can, and does. "works in test" just means there's not enough data to exercise the query properly.

BBarn · July 13, 2015, 8:34pm

So how does SQL make the determination that a smaller dataset doesn't have an error and a larger one does? I need to kill that process since I know the data I supplied is correct and works 100% of the time. I can shave off some time in SQL's evaluation of the data as the evaluation is not necessary.

gbritton · July 13, 2015, 8:38pm

DId you look at the parallelism? It may very well be gathering a number of parallel streams and then looking at the data conversion, sees incompatible types and balks. It's why I suggested converting to compatible types (in the CROSS APPLY clauses) in the first place.

Read the execution plan carefully.

BBarn · July 13, 2015, 8:41pm

I'll brush up on the parallelism. Not a strong suit of mine. I understand the concept and the intent behind it (not unlike threading an application) but I am not anywhere near an expert on it. That may in fact be the piece I am missing.

BTW, thanks for the exchange on ideas and thoughts, I don't want to come off as ungrateful. I really appreciate the dialog today.

Kristen · July 16, 2015, 9:46am

I have NOT read the whole thread, so my comments may be useless

if the NOT IN list contains NULL then all bets are off.

From earlier reading of the thread it didn't look like a NOT IN, let alone the list containing NULL, was being used, so I've only picked up on these couple of points just in case it is relevant.