Calculating existing delays , Off delays and new delays on manufacturing data

Hi All,

Hope you are doing well!.. I have a manufacturing plant data on a daily basis and specific vehicle (denoted by sno) belonging to a specific category (mmodel) that can go on a delay due to some issues...Now my Input table has the snapshot of this data on a daily basis and denotes the vehicle (conmbination of sno and mmodel) that might be a new delay in comparison to the previous day (new delay implies that the vehicle combination dint exist on the immediate previous day to the day in comparison) ; existing delay - the same vehicle combination existed on the previous day or Off delay -the vehicle combination for the previous day doesn't exist today (or the day in consideration).. The numbers for the current day are calculated based on the vehicle entries on the current day and the immediate previous day ....The first day is the day of reference where all entries are just delays and nor off delay or new delay...The subsequent days entries are compared with the immediate previous day to get the delays ,new delays and off delay... The output table 1 is a summary view by snapshot date and the output 2 table is a view by master model and snapshot date...Can you please help me here.

##Input
create table ##input

(sno varchar(1000),
mmodel varchar(1000),
snapshot_date date)

insert into ##input values
('r123 ','S1234','3/1/2022'),
('r3421','S1234','3/1/2022'),
('y123','D2123','3/1/2022'),
('g21q','D2123','3/1/2022'),
('W321','G345','3/1/2022'),
('E231','G345','3/1/2022'),
('r123 ','S1234','3/2/2022'),
('D891','S1234','3/2/2022'),
('y123','D2123','3/2/2022'),
('W321','G345','3/2/2022'),
('K1231','J1231','3/2/2022'),
('r123 ','S1234','3/3/2022'),
('D891','S1234','3/3/2022'),
('H213','V231','3/3/2022'),
('121k','N213','3/3/2022')

output1

create table ##output1
(snapshot_date date,
delay int,
offdelay int,
newdelay int)

insert into ##output1 values
('3/1/2022','6','',''),
('3/2/2022','3','3','2'),
('3/3/2022','2','3','2')

output2

create table ##output2
(mmodel varchar(1000),
snapshot_date date,
delay int,
offdelay int,
newdelay int)

insert into ##output2 values
('S1234','3/1/2022','2','',''),
('D2123','3/1/2022','2','',''),
('G345','3/1/2022','2','',''),
('S1234','3/2/2022','1','1','1'),
('D2123','3/2/2022','1','1',''),
('G345','3/2/2022','1','1',''),
('J1231','3/2/2022','','','1'),
('S1234','3/3/2022','2','',''),
('D2123','3/3/2022','','1',''),
('G345','3/3/2022','','1',''),
('J1231','3/3/2022','','1',''),
('V231','3/3/2022','','','1'),
('N213','3/3/2022','','','1'),

The earliest date in my data (ascending order) would be considered as the date of reference...So in this case 03/01/2022 would be considered as the date of reference where all the unique entries (an entry is denoted by the combination of sno and mmodel) would be considered just delays...

So when I consider 03/02/2022 the common entries between 03/02/2022 and the previous day which is 03/01/2022 would be the delays which are in data (r123,S1234) ,(y123,D2123) and (W321,G345)

the entries existing in 03/01/2022 but not in 03/02/2022 would be considered as off-delays which are in my case (r3421,S1234) ,(g21q,D2123) and (E231,G345).

The entries that are completely new in 03/02/2022 but not in 03/01/2022 would be (D891,S1234) and (K1231,J1231) which are considered as new delays...

Thanks,
Arun

ddl doesn't work

Hi Mike,

Can you please let me know why do you say that..

Thanks,
Arun

Have you tested it locally dml and ddl?

Hi Yosiaz and Mike01 , Sorry somehow extra braces showed up..I have now corrected the same..

still doesn't work. Output 2 fails:

Msg 241, Level 16, State 1, Line 8
Conversion failed when converting date and/or time from character string.

Apologies again!..I have corrected the same!..

I have also added the calculation explanation file...(The entries from the previous day needs to be compared and referenced to the current day in comparison)...

What version of SQL Server are you using?

Hi Jeff, I am using SQL server 2017

@arunc ,

Heh... I should title this post...
The "Black Arts" of "NO_DAX BI" Reporting Using Only T-SQL :yum:

Ok... first of all, the ##input working table will benefit a bit from having a Clustered Index ("CI" from here on) on it. Unfortunately, you have the sno and mmodel columns as VARCHAR(1000) and so that causes length warnings for the Clustered Index. I could certainly be wrong about you sno and mmodel columns but I currently can't imagine them needing to be longer than a VARCHAR(100). If they are longer than that, try to limit them to no more than VARCHAR(400) to avoid the CI length error.

Here's the modified test data setup with the modified columns and the CI/PK included. I still uses the data you very nicely provided. Thank you for that.

   DROP TABLE IF EXISTS ##input;
GO
 CREATE TABLE ##input
        (
         sno            varchar(100) NOT NULL
        ,mmodel         varchar(100) NOT NULL
        ,snapshot_date  date         NOT NULL
        ,PRIMARY KEY CLUSTERED (sno,mmodel,snapshot_date)
        )
;
    SET DATEFORMAT MDY; --Just to make the code easier to demo across the pond.
 INSERT INTO ##input WITH (TABLOCK)
 VALUES  ('r123 ','S1234','3/1/2022')
        ,('r3421','S1234','3/1/2022')
        ,('y123' ,'D2123','3/1/2022')
        ,('g21q' ,'D2123','3/1/2022')
        ,('W321' ,'G345' ,'3/1/2022')
        ,('E231' ,'G345' ,'3/1/2022')
        ,('r123 ','S1234','3/2/2022')
        ,('D891' ,'S1234','3/2/2022')
        ,('y123' ,'D2123','3/2/2022')
        ,('W321' ,'G345' ,'3/2/2022')
        ,('K1231','J1231','3/2/2022')
        ,('r123 ','S1234','3/3/2022')
        ,('D891' ,'S1234','3/3/2022')
        ,('H213' ,'V231' ,'3/3/2022')
        ,('121k' ,'N213' ,'3/3/2022')
GO

The following code provides "the answers" by row (21 rows, just like your output example) and dumps those into a temp table so that you can aggregate them any way you want.

    DROP TABLE IF EXISTS ##Classified;
GO
   WITH
 cteLeft AS
(--==== The TOP and ORDER BY form a "Blocking Operator" to prevent a Cartesian Product later in the code,
     -- which seriously reduces the number of reads in the code despite what the Execution Plan "% of Batch" says.
 SELECT TOP 2147483647
        *
        ,OffSetDate = DATEADD(dd,1,snapshot_date) --We also need this date later to prevent a Cartesian Product
   FROM ##input
  ORDER BY sno,mmodel,snapshot_date
)
,cteClassify AS
(--==== The FULL JOIN allows the extra rows for the OFF entries to materialize instead of doing that separately
 SELECT 
        -- l_sno           = l.sno    --These items can be uncommented to see the actual comparisons done.
        --,l_mmodel        = l.mmodel --and may be left in or removed for the prod version.
        --,l_snapshot_date = l.snapshot_date,
         sno            = ISNULL(r.sno          ,l.sno       )
        ,mmodel         = ISNULL(r.mmodel       ,l.mmodel    )
        ,snapshot_date  = ISNULL(r.snapshot_date,l.OffSetDate)
        ,NewDelay       = IIF(l.sno IS NULL AND r.sno IS NOT NULL,1,0)
        ,Delay          = IIF(l.OffSetDate = r.snapshot_date,1,0)
        ,OffDelay       = IIF(l.sno IS NOT NULL AND r.sno IS NULL,1,0)
        ,MaxDate        = MAX(r.snapshot_date) OVER () --We need this to exclude "the next day" at the logical end.
   FROM      cteLeft l
   FULL JOIN ##input r   ON l.sno        = r.sno
                        AND l.mmodel     = r.mmodel
                        AND l.OffSetDate = r.snapshot_date
)
 SELECT sno,mmodel,snapshot_date,NewDelay,Delay,OffDelay
   INTO ##Classified
   FROM cteClassify
  WHERE snapshot_date <= MaxDate    --Exclude the manufactured 4th of the month, in this case.
  ORDER BY sno,mmodel,snapshot_date --This order makes it easy to see the code is correct.
;

Speaking of aggregations, the following code provides the detail rows, the output for output1, the output for output2, an output3 (which you'll eventally need), and a Grand Total, which you'll also need. If you were to dump this into a reporting table (real table, not a temp table), then you could select just about any kind of pre-aggrgated data you want for "BI reporting and analysis". by total type, date, model, whatever.

 SELECT  TotalType     = CASE GROUPING_ID(snapshot_date,mmodel) --This is faster than CHOOSE
                         WHEN 0 THEN 'By Date/Model'
                         WHEN 1 THEN 'By Date' 
                         WHEN 2 THEN 'By Model'
                         WHEN 3 THEN 'Grand Total'
                         END
        ,snapshot_date = CASE GROUPING_ID(snapshot_date,mmodel) --This is faster than CHOOSE
                         WHEN 0 THEN CONVERT(VARCHAR(100),snapshot_date,23) --Detail row YYYY-MM-DD in ENGLISH
                         WHEN 1 THEN CONVERT(VARCHAR(100),snapshot_date,23) 
                         WHEN 2 THEN REPLICATE('-',10)
                         WHEN 3 THEN REPLICATE('-',10)
                         END
        ,mmodel        = CASE GROUPING_ID(snapshot_date,mmodel) --This is faster than CHOOSE
                         WHEN 0 THEN mmodel --Detail row
                         WHEN 1 THEN REPLICATE('-',5)
                         WHEN 2 THEN mmodel
                         WHEN 3 THEN REPLICATE('-',5)
                         END
        ,NewDelay      = SUM(NewDelay)
        ,Delay         = SUM(Delay)
        ,TotalDelay    = SUM(NewDelay)+ SUM(Delay)
        ,OffDelay      = SUM(OffDelay)
        ,UnResolved    = SUM(NewDelay)+ SUM(Delay) - SUM(OffDelay)
   FROM ##Classified
  GROUP BY CUBE (snapshot_date, mmodel) 
  ORDER BY GROUPING(snapshot_date),snapshot_date,GROUPING(mmodel),mmodel
GO

Here's what that code ultimately produces... The "UnResolved" column produces a very telling 4th output especially for the "By Modell" total rows.. :wink:

Some references that you might want to consider reading since you're the one that will need to support the code:

This first link introduces CUBE and ROLLUP (Rollup is really just a subset of CUBE)