Group rows based on prefix change in string

ITTrucker · June 8, 2015, 6:30pm

I inherited a list of stored procedures that are run in order that do some data error checking. The name of the procedures is stored in a table with a run order and the name is prefixed by a code denoting the severity of the error. (I know this should be in it's own column but it's not).

DECLARE @SPTypes as table (Client integer, Name varchar(100), RunOrder integer)
INSERT INTO @SPTypes
VALUES (100,'U - Update bad data',0),
(100,'X - Omit very bad data',1),
(100,'X - Omit for some other reasons',2),
(100,'W - Warn for some questionable data',3),
(100,'W - Warn for some other reasons',4),
(100,'OE - Overrideable Error some bad data',5),
(100,'OE - Overrideable Error for some other reasons',6),
(100,'E - Error some bad data',7),
(100,'E - Error for some other reasons',8),
(100,'U - Update records',9),
(200,'U - Update bad data',0),
(200,'X - Omit very bad data',1),
(200,'W - Warn for some questionable data',2),
(200,'W - Warn for some other reasons',3),
(200,'OE - Overrideable Error some bad data',4),
(200,'OE - Overrideable Error for some other reasons',5),
(200,'E - Error some bad data',6),
(200,'X - Omit for some other reasons',7),
(200,'E - Error for some other reasons',8),
(200,'U - Update records',9)

Step 1) I'd like to be able to group the prefixes of the procedures in their run order, I can get a list of ALL the prefixes in order, but I need to group them anytime there is a change in the prefix. (bad wording, see expected results below)
The query below gives all the prefixes in order, but I want to return a distinct grouping, by client, in the run order of the prefixes so every change in prefix writes a new row in the data.
SELECT LTRIM(RTRIM(SUBSTRING(Name,1,CHARINDEX('-',Name)-1))) from @SPTypes
ORDER BY RunOrder

*Expected Results for Client 100
U
X
W
OE
E
U

Client 200 would look like
U
X
W
OE
E
X
E
U

Step 2) I think I can do this once I get Step 1 done, but I want to compare the results against a list to make sure that there aren't any X's below any E's. The order for client 100 above is correct, but there are some instances where the order is bad and I want to try and track them down. A valid pattern would is U,X,W,OE,E,U, so client 200 is bad.

Vinnie881 · June 8, 2015, 9:01pm

ITTrucker:

DECLARE @SPTypes as table (Client integer, Name varchar(100), RunOrder integer)INSERT INTO @SPTypesVALUES (100,'U - Update bad data',0),(100,'X - Omit very bad data',1),(100,'X - Omit for some other reasons',2),(100,'W - Warn for some questionable data',3),(100,'W - Warn for some other reasons',4),(100,'OE - Overrideable Error some bad data',5),(100,'OE - Overrideable Error for some other reasons',6),(100,'E - Error some bad data',7),(100,'E - Error for some other reasons',8),(100,'U - Update records',9),(200,'U - Update bad data',0),(200,'X - Omit very bad data',1),(200,'W - Warn for some questionable data',2),(200,'W - Warn for some other reasons',3),(200,'OE - Overrideable Error some bad data',4),(200,'OE - Overrideable Error for some other reasons',5),(200,'E - Error some bad data',6),(200,'X - Omit for some other reasons',7),(200,'E - Error for some other reasons',8),(200,'U - Update records',9)

This should get you what you want.

DECLARE @SPTypes as table (pk int identity(1,1) primary key,Client integer, Name varchar(100), RunOrder integer)
INSERT INTO @SPTypes(client,name,runorder)
VALUES (100,'U - Update bad data',0),
(100,'X - Omit very bad data',1),
(100,'X - Omit for some other reasons',2),
(100,'W - Warn for some questionable data',3),
(100,'W - Warn for some other reasons',4),
(100,'OE - Overrideable Error some bad data',5),
(100,'OE - Overrideable Error for some other reasons',6),
(100,'E - Error some bad data',7),
(100,'E - Error for some other reasons',8),
(100,'U - Update records',9),
(200,'U - Update bad data',0),
(200,'X - Omit very bad data',1),
(200,'W - Warn for some questionable data',2),
(200,'W - Warn for some other reasons',3),
(200,'OE - Overrideable Error some bad data',4),
(200,'OE - Overrideable Error for some other reasons',5),
(200,'E - Error some bad data',6),
(200,'X - Omit for some other reasons',7),
(200,'E - Error for some other reasons',8),
(200,'U - Update records',9)


declare @anchor int = 0
declare @lastid as varchar(100)
declare @lastclient as integer 

--select client,left(name,CHARINDEX('-',name)-1) from

update a
set 
@lastid = name = case when left(name,CHARINDEX('-',name)-1)  =@lastid  and @lastclient = client then null else left(name,CHARINDEX('-',name)-1) end
,@lastclient = client
,@anchor = pk 
 from
 @SPTypes a

 select * from @SPTypes where not name is null

stephen_hendricks · June 8, 2015, 9:02pm

;with Prefixes as ( select Client, LTRIM(RTRIM(SUBSTRING(Name,1,CHARINDEX('-',Name)-1))) Prefix, RunOrder from @SPTypes ) select a.Client, a.Prefix from Prefixes a left outer join Prefixes b on a.Client = b.Client and a.RunOrder + 1 = b.RunOrder where a.Prefix <> b.Prefix or b.Prefix is null order by a.Client, a.RunOrderThis does assume that the RunOrder is sequential. If not, you can always impose a row_number() on the data.

ScottPletcher · June 8, 2015, 9:04pm

Assuming the table is properly keyed, on (Client, RunOrder), this should work and should perform well enough:

SELECT Client, LEFT(Name, 2) AS Prefix
FROM @SPTypes st1
WHERE
    LEFT(st1.Name, 2) <> ISNULL((
        SELECT TOP (1) LEFT(st2.Name, 2)
        FROM @SPTypes st2
        WHERE
            st2.Client = st1.Client AND
            st2.RunOrder > st1.RunOrder
        ORDER BY st2.RunOrder
        ), '~')

Vinnie881 · June 8, 2015, 9:10pm

stephens query will be easy to follow, the one I posted should provide the best performance, and scotts query has some unnecessary overhead if you were going to use in production, so check the execution before you implement, but what I tell everyone, use the one that you are most comfortable with.

ITTrucker · June 9, 2015, 2:54pm

@Vinnie881 Your code works great if there are only two of the same codes in a row, (which is what I put in the sample) but if there are 5 OE's in a row, the first is updated to OE, the next is NULL but because OE <> NULL, the next line is OE and ends up looking like:
OE
NULL
OE
NULL
OE

In production, some of these clients have 50+ procs so there's a lot of multiples. I'm trying to work through the update to figure if I can put an ISNULL on @lastid but I think I'd need another variable to handle that and I'm still working through it.

stephen_hendricks · June 9, 2015, 2:57pm

Vinnie, your approach is clever if a bit abstruse. My only concern is that it is a destructive update. Any thoughts on how you'd preserve the original Name column.

ITTrucker · June 9, 2015, 4:36pm

I gave up on the alternating NULL's from the update query because both @stephen_hendricks and @ScottPletcher solutions worked with the larger/messier production dataset. The CTE was slightly slower once I plugged in all the production tables but since I'm only dealing with less than 50k rows, the 2 sec difference between them wasn't a big deal. I switched a couple variable types and names around and added a couple other columns once I put live data in there but the concept for both worked fine.

I inserted both the results of the queries above into another temp table along with a new row number and then declared another temp table with the Pattern to match: (had to put #'s in there instead of the @ sign, the forums thought I was trying to mention another user)

DECLARE #PatternMatch as table (pk int identity(1,1) primary key, Prefix varchar(5), RunOrder integer)
INSERT INTO #PatternMatch(Prefix,runorder)
VALUES ('U',1),('X',2),('W',3),('OE',4),('E',5),('U',6)

and then did a

LEFT OUTER JOIN #PatternMatch P ON F.RowNum = P.RunOrder and F.ErrorType = P.Prefix
where P.pk is NULL

and I got a list of all the checks that weren't in the order I wanted. (lots of records returned )

Thanks for all your help all.

Vinnie881 · June 9, 2015, 5:28pm

You can just add another variable for the check

DECLARE @SPTypes as table (pk int identity(1,1) primary key,Client integer, Name varchar(100), RunOrder integer)

INSERT INTO @SPTypes(client,name,runorder)
VALUES (100,'U - Update bad data',0),
(100,'X - Omit very bad data',1),
(100,'X - Omit for some other reasons',2),
(100,'W - Warn for some questionable data',3),
(100,'W - Warn for some other reasons',4),
(100,'W - Warn for some other reasons',5),
(100,'W - Warn for some other reasons',6),
(100,'W - Warn for some other reasons',7),
(100,'OE - Overrideable Error some bad data',5),
(100,'OE - Overrideable Error for some other reasons',6),
(100,'E - Error some bad data',7),
(100,'E - Error for some other reasons',8),
(100,'U - Update records',9),
(200,'U - Update bad data',0),
(200,'X - Omit very bad data',1),
(200,'W - Warn for some questionable data',2),
(200,'W - Warn for some other reasons',3),
(200,'OE - Overrideable Error some bad data',4),
(200,'OE - Overrideable Error for some other reasons',5),
(200,'E - Error some bad data',6),
(200,'X - Omit for some other reasons',7),
(200,'E - Error for some other reasons',8),
(200,'U - Update records',9)

declare @anchor int = 0
declare @lastid as varchar(100)
declare @lastid2 as varchar(100)
declare @lastclient as integer

--select client,left(name,CHARINDEX('-',name)-1) from

update a
set
@lastid = name = case when left(name,CHARINDEX('-',name)-1) =@lastid2 and @lastclient = client then null else left(name,CHARINDEX('-',name)-1) end
,@lastid2 = left(name,CHARINDEX('-',name)-1)
,@lastclient = client
,@anchor = pk
from
@SPTypes a

select * from @SPTypes where not name is null

Vinnie881 · June 9, 2015, 5:35pm

To explain how the query works is a SQL update statement works as follows

FIELDS THEN Variables. So what is happening is the field is updated using the value of the variable from the previous row.

This is safe to do on tables that are not joined and have a clustered index on it, but it is not a good practice if you are joining unless you have an incredibly hi certainty the clustered index of the main table is going to be used in the update.