Where clause of an update statement

ScottPletcher · July 13, 2015, 4:08pm

select *
from (
    select '0000' as locationtime union all
    select 'a000' union all
    select '2359' union all
    select '2400' union all
    select '1399'    
) as t
where 1 = case when t.locationtime like '%[^0-9]%' then 0 when t.locationtime / 100 between 00 and 23 and t.locationtime % 100 between 00 and 59 then 1 else 0 end

gbritton · July 13, 2015, 4:08pm

That might fail if locationtime is not numeric, SQL does not short-circuit WHERE clauses

BBarn · July 13, 2015, 4:45pm

I'll have a distinct list of times here in a few minutes.

I think time is the red herring here though. The case logic that I added earlier only dealt with the date side of the equation, not the time side. The logic in the case statement only filters out the NULL dates from evaluation.

Which if true is odd, since the where DATE >= AND DATE < should have done the same thing.

BBarn · July 13, 2015, 4:53pm

This is with no filters on the query other than to remove the 1,440 known valid times.

locationtime
12
008
075
141
160
080
225
0
130
123
222
0
1
083
094
204
193

0
179
001
121
110
17
235
1
140
000
100

BBarn · July 13, 2015, 4:54pm

This one filtered out only for the year of 2013.
Both queries used a NOT IN(<list of 1440 valid times>)

Again, I don't think time is the issue it should only be the ones listed. Then again, I am talking about dates sneaking in that aren't between the filtered range...#STYMIED

locationtime
12
0
222
204
193

121
17
140
100

BBarn · July 13, 2015, 5:00pm

Sorry, I have to go back to my first statement. The addition of the case statement on the update to eliminate the invalid dates (NULL) from the statement proves successful.

To me that means one of a few things:

The where clause is not relevant to the update statement as it is being ignored.
The where clause is not a hard bounding limit unless you put the CASE statement in SQL doesn't believe you are serious.
SQL produces a table scan or a pre-scan when a recordset is beyond a certain size and does this at a table level without regard to the WHERE clause the user has provided

I mean this seems to be a pretty basic update statement on a table that isn't huge by SQL standards. The table is 45m rows but less than 100 columns wide. Why the oddity???

BBarn · July 13, 2015, 5:09pm

Here is a thought/What if:

where locationdate >= '1/1/2013'
and locationdate <= '12/1/2013'

vs.

where (locationdate >= '1/1/2013'
and locationdate <= '12/1/2013')

vs.

where locationdate BETWEEN '1/1/2013' and '12/1/2013'

Would any of these make a difference???

gbritton · July 13, 2015, 5:25pm

The where clause is relevant and not being ignored
2, The where clause filters the rows being returned, the CASE statement can navigate around bad data
SQL does not do a pre-scan.

I conjured up some test cases. Have a look and see if there's anything useful for you:

DECLARE @ TABLE (locationdate DATE, locationtime VARCHAR(4), locdatetime DATETIME)

INSERT INTO @ (locationdate, locationtime) VALUES 
	(NULL, NULL), 
	(NULL, ''), 
	(NULL, 'abcd'), 
	(NULL, '0060'), 
	(NULL, '1259'), 
	(current_timestamp, NULL), 
	(current_timestamp + 1, ''), 
	(current_timestamp + 2, 'abcd'), 
	(current_timestamp + 3, '0060'), 
	(current_timestamp + 4, '1259')

SELECT * FROM @

-- Update the data, computing datetime from parts

UPDATE t
SET locdatetime = dateadd(minute, locmins, dateadd(hour, lochours, cast(locationdate AS DATETIME)))
OUTPUT inserted.*		-- let's see what happened
FROM @ t

-- Get hhmm as an integer (or null)
CROSS APPLY (
 SELECT 
	CASE WHEN locationtime NOT LIKE '%[^0-9]%' -- numeric, that is
		THEN CASE WHEN cast(locationtime AS INT) BETWEEN 0000 AND 2359
				   AND cast(locationtime AS INT) % 100 < 60 THEN cast(locationtime AS INT) 
		END 
	END
 ) _(loctime)

-- Get hours and minutes as integers (or nulls)
CROSS APPLY (
 SELECT isnull(loctime / 100, 0), isnull(loctime % 100, 0)
 ) __(lochours, locmins)

WHERE locationtime NOT LIKE '%[^0-9]%'		-- numeric only
 AND locationtime BETWEEN '0000' AND '2359' -- valid range
 AND right(locationtime, 2) < '60'			-- valid minutes

SELECT * FROM @

BBarn · July 13, 2015, 5:34pm

Your answer I believe to be true gbritton, however, I can't escape a few incontrovertible facts:

No bad data can come from the date side. Using the query to specify only days within a particular year means it has to be valid.
Explicitly enumerating only the valid possible time codes means that there can be no bad times included.

We are only dealing with three fields, LOCATIONDATE, LOCATIONTIME and LOCATIONDATETIME. The first two fields are ranged to be only valid data and the last field is the one being updated. With the ranging of the first two fields, where is the "bad data" sneaking in at?

Mind you I am not doubting the voracity of your statement, I would believe it to be true as well. I am however not seeing results that support the statement which is what is driving at me.

I have a test running now with the parenthesis added around the date box, next I am going to try the between operator. In the mean time, I am going to play with the SQL you just sent, we'll see what results we get on a small set of data.

BBarn · July 13, 2015, 5:40pm

Result 1

locationdate locationtime locdatetime
NULL NULL NULL
NULL NULL
NULL abcd NULL
NULL 0060 NULL
NULL 1259 NULL
2015-07-13 13:37:43.967 NULL NULL
2015-07-14 13:37:43.967 NULL
2015-07-15 13:37:43.967 abcd NULL
2015-07-16 13:37:43.967 0060 NULL
2015-07-17 13:37:43.967 1259 NULL

Result 2

locationdate locationtime locdatetime
NULL 1259 NULL
2015-07-17 13:37:43.967 1259 2015-07-18 02:36:43.967

Result 3

Locationdate locationtime locdatetime
NULL NULL NULL
NULL NULL
NULL abcd NULL
NULL 0060 NULL
NULL 1259 NULL
2015-07-13 13:37:43.967 NULL NULL
2015-07-14 13:37:43.967 NULL
2015-07-15 13:37:43.967 abcd NULL
2015-07-16 13:37:43.967 0060 NULL
2015-07-17 13:37:43.967 1259 2015-07-18 02:36:43.967

gbritton · July 13, 2015, 5:41pm

Right, so Result1 is before the update, Result2 is what got updated, Result3 is after the update. The test also injects some bad data to test the WHERE clause and the CASE statement

BBarn · July 13, 2015, 5:43pm

Why the need for the case statement? Why not just limit the invalid data via the where clause?

gbritton · July 13, 2015, 5:47pm

Because of what we've been talking about all along. SQL is conservative, and does not assume that the WHERE clause implies the conversions will work. WE can see that, of course, but the compiler cannot always see it or at least does not assume it. Hence the WHERE clause filters the rows to be updated and the CASE expressions take care of any potentially bad data. It's good actually, since someone may come along after you win the lottery and change the WHERE clause, but the CASE expression still guards against conversion errors.

BBarn · July 13, 2015, 6:18pm

So let me take the stick to the dead horse: What bad data?

We eliminated all bad data in the where clause, there should not be a single record that has a NULL date or a time that is not a valid 24 hour and minute combination. This is the question that has been keeping me up at night. lol

gbritton · July 13, 2015, 7:17pm

We eliminate bad data in the where clause, but SQL doesn't know it when it compiles the SELECT clause. So it assumes there may be bad data and throws an error.

If I were you, I'd get some sleep and stop worrying about it. take my example as a guide and modify your query to use it. If it works, just chalk it up as one of life's little mysteries.

cln014 · July 13, 2015, 8:02pm

Am I the only one that assumes there can't be bad data in either the DATE or TIME fields because SQL will not accept any bad data as long as the column is set up correctly?

gbritton · July 13, 2015, 8:06pm

You may be the only one. Note: The time column is not datatype 'time'. That would be easy. It's actually varchar(4) that should hold a time value as 'hhmm'. So without a good check constraint, anything can get in there.

cln014 · July 13, 2015, 8:10pm

You know what they say about those who assume! Ha. Good to know, though. I guess I haven't come across that datatype yet. Thanks for the straightening out

BBarn · July 13, 2015, 8:15pm

I am struggling here to getting my head around this. If it is the compiler, what is it compiling? Is it evaluating all rows of the table without a where clause?

This is a pretty big deal, I have seen this issue before with large tables and multiple views compiled to present data. I have had issues with this type of data conversion in the process and always just said I would deal with it later.

I now have a simple example of the issue and am struggling to get a handle on the reason SQL is behaving in a non-logical fashion. Without understanding how the compiler or the engine is processing records I can't even begin to assume SQL is all that structured.

The other bigger issue is this. If SQL is having difficulties with a simple query like this and I have to help it by generating case statements, what is the point? I should just write the entire statement sans a where clause and just handle the logic for what I am looking for in the CASE contained in the select clause.

The other issue is this is a simple one, one that a simple case statement is able to resolve. I have other queries that if I have to apply a case statement to them, they will be far more complicated dare I even say impossible to generate a case statement for all of the conditions I would need to trap for.

I really need a behind the scenes explanation of how bad data is even being factored into the query when the bad data is being eliminated. If the compiler is looking at all rows and determining there is a problem, I need some guidance as to how to prevent this. That would mean that processing time is being wasted looking at rows that aren't necessary to the question being posed by the SQL statement.

gbritton · July 13, 2015, 8:18pm

compiling the query to an execution plan

no

Then SQL would perform the update on all 45 million rows instead of the subset that match your filter. Bad for performance

It's not. It's looking at the datatypes and anticipating that there may be a problem.

FWIW if the datatypes were correct in the first place (using varchar(4) for a time value is not correct) you wouldn't be losing any sleep