Searching for a sequence of records

Greetings, I just joined your list.

Due to a problem at work I am having to learn sql on the fly today. I have attached a table that exemplifies my data set problem. I hope it is viewable. All of the data I need is in this table so no foreign keys are required.

I am looking for the case where a line like number 7 is followed by line 8. The first line should have event is null and expected = start and the next line should have event = stop and expected is null. The case number should be the same and the 2nd line result_id should equal the first line result_id + 1. The query should return the full record of the first line case_num.

A further test: When a case_num is found matching the conditions it is desirable to find a record with that case_num and the event = run.

I was able to select all lines like 7 and 8 but then had to copy to Excel to do the rest of the checking. This is the query I used:
select case_num from PROD.cal_data where (expected = 'start' and event is null) or (event = stop and expected is null);

I think I need to use a where-in structure but I am getting bogged down trying to build a complex (to me) subquery.

Your guidance would sure be appreciated.



If you just want the case number, you should be able to use the code below. I didn't have any usable data to test it with, but I think it's correct or at least extremely close.

select case_num
from PROD.cal_data 
group by case_num
having max(case when event is null and expected = 'start' then result_id else 0 end) > 0 and
    (max(case when event = 'stop' and expected is null then result_id else 0 end) =
     (max(case when event is null and expected = 'start' then result_id else 0 end) + 1))

Thanks for posting Scott. Unfortunately I edited my problem while you were coming up with a solution.

Your query seems to run but I haven't been able to get a result yet as MySQL times out after 30 seconds. There are over 7 million rows in the database.


That's actually an efficient way of doing it. But the table is likely not clustered to best fulfil the query either. Based on the extremely limited info available now, it would be best to cluster the table on ( case_num, line ) [assuming that makes a unique combination, which I think MySQL requires).

The more individual case_num values there are, the more overhead the query will be.