Very very difficult query

Hello

declare @t table ( emp_code varchar(6), trans_date datetime, status varchar(2), time_portion varchar(8))

ideally when an employees comes in office he should punched in (status=01)
and when he goes out of office he should punch out (status=02)
but in practice there are a lot of irregularities.

  1. The employee punches in and does not punch out.
  2. Pnch out and didnt punched in.
  3. Repeated punch ins.
  4. Repeated punch outs.

We need show in and out in the same line within date group.

Sample data

insert @t (emp_code, trans_date, status, time_portion)
values('004070','2023-10-02','01','19:51:59'),
('004070', '2023-10-02', '02', '19:52:01'),
('004070', '2023-10-03', '02', '09:09:57'),
('004070', '2023-10-03', '02', '19:49:26'),
('004070', '2023-10-03', '01', '19:49:47'),
('004070', '2023-10-04', '02', '09:05:55'),
('004070', '2023-10-04', '01', '20:22:52'),
('004070', '2023-10-04', '02', '20:22:54'),
('004070', '2023-10-04', '06', '20:22:58'),
('004070', '2023-10-04', '02', '20:23:00'),
('004070', '2023-10-05', '02', '09:18:59'),
('004070', '2023-10-05', '01', '09:19:02'),
('004070', '2023-10-05', '02', '14:44:44')

expected result
trans_date timein timeout
2023-10-02, 19:51:59 19:52:01

2023-10-03, NULL 19:09:57
2023-10-03, NULL 19:49:26
2023-10-03, 19:49:47 NULL

2023-10-04, NULL 09:05:55
2023-10-04, 20:22:52 20:22:54
2023-10-04, NULL 20:23:00

2023-10-05, NULL 09:18:59
2023-10-05, 09:19:02 14:44:44

There is one more problem. We have sql server 2008 and cannot use lead or lag command.

You should really not be using SQL2008 as there have been at least two GDR security updates since it was depreciated. ie It is a security breach waiting to happen.

If the database is in SQL2008 compatibility mode then there will probably be no problems running it in SQL2019 with a compatibility mode of SQL2012 or SQL2008. In SQL2008, if the compatibility mode is less than SQL2008 then upgrading will probably be more challenging.

The outline of this query is actually quite straight forward:

  1. Determine if a row for an In/Out Pair is missing
  2. Add the missing rows.
  3. Group the pairs.
WITH Missing
AS
(
	SELECT emp_code, trans_date, [status], time_portion
		,CASE
			WHEN [status] = '01'
				AND LEAD([status]) OVER (PARTITION BY emp_code ORDER BY trans_date, time_portion) <> [status]
			THEN 'N'
			WHEN [status] = '02'
				AND LAG([status]) OVER (PARTITION BY emp_code ORDER BY trans_date, time_portion) <> [status]
			THEN 'N'
			ELSE 'Y'
		END AS IsMissing
	FROM #t
)
,AllRows
AS
(
	/* Remember the logical order of processing here is FROM, WHERE, SELECT */
	SELECT M.emp_code
		,CASE WHEN M.IsMissing = 'N' OR M.[status] = D.[status] THEN M.trans_date END AS trans_date
		,CASE
			WHEN M.IsMissing = 'N'
			THEN M.[status]
			ELSE D.[status]
		END AS [status]
		,CASE WHEN M.IsMissing = 'N' OR M.[status] = D.[status] THEN M.time_portion END AS time_portion
		,(ROW_NUMBER() OVER (PARTITION BY M.emp_code ORDER BY M.trans_date, M.time_portion, D.[status]) - 1)/ 2 AS pair
	FROM Missing M
		/* The maximum number of missing rows is the number of rows. */
		/* Duplicate the rows. */
		CROSS APPLY (VALUES ('01'),('02')) D ([status])
	/* Only keep needed rows */
	WHERE (M.IsMissing = 'N' AND D.[status] = '01')
		OR M.IsMissing = 'Y'
)
SELECT emp_code
	,MIN(CASE WHEN [status] = '01' THEN trans_date END) AS InDate
	,MIN(CASE WHEN [status] = '01' THEN time_portion END) AS InTime
	,MIN(CASE WHEN [status] = '02' THEN trans_date END) AS OutDate
	,MIN(CASE WHEN [status] = '02' THEN time_portion END) AS OutTime
FROM AllRows
GROUP BY emp_code, pair;

If you really have to do this without LEAD and LAG then replace the Missing CTE with:

WITH rns
AS
(
	SELECT emp_code, trans_date, [status], time_portion
		,ROW_NUMBER() OVER (PARTITION BY emp_code ORDER BY trans_date, time_portion) AS rn
	FROM #t
)
,Missing
AS
(
	SELECT C.emp_code, C.trans_date, C.[status], C.time_portion
		,CASE
			WHEN C.[status] = '01'
				AND N.[status] <> C.[status]
			THEN 'N'
			WHEN C.[status] = '02'
				AND P.[status] <> C.[status]
			THEN 'N'
			ELSE 'Y'
		END AS IsMissing
	FROM rns C
		LEFT JOIN rns N
			ON C.emp_code = N.emp_code
				AND C.rn = N.rn - 1
		LEFT JOIN rns P
			ON C.emp_code = P.emp_code
				AND C.rn = P.rn + 1
)
1 Like

Since we don't know how you're original data is indexed, I just made the code work for now, not sure about how well/efficiently it will run, since the data will have to be sorted 4 times. I have a version that uses an indexed temp table, if you want to try that instead.


;WITH cte_ordered AS (
    SELECT emp_code, trans_date, status, time_portion, ROW_NUMBER() OVER(
        PARTITION BY emp_code, trans_date ORDER BY time_portion) AS row_num
    FROM @t
),
cte_punch_ins AS (
    SELECT t1.emp_code, t1.trans_date, t1.time_portion AS timein, 
        t2.time_portion AS timeout
    FROM cte_ordered t1
    OUTER APPLY (
        SELECT t2.*
        FROM cte_ordered t2
        WHERE t2.emp_code = t1.emp_code AND t2.trans_date = t1.trans_date AND 
            t2.status = '02' AND t2.row_num = t1.row_num + 1
    ) AS t2
    WHERE t1.status = '01'
),
cte_punch_outs AS (
    SELECT t1.emp_code, t1.trans_date, CAST(NULL AS varchar(8)) AS timein, 
        t1.time_portion AS timeout
    FROM cte_ordered t1
    WHERE t1.status = '02' AND NOT EXISTS (
        SELECT * FROM cte_ordered t2 WHERE t2.emp_code = t1.emp_code AND 
            t2.trans_date = t1.trans_date AND t2.status = '01' AND 
            t2.row_num = t1.row_num - 1 )
)
SELECT *
FROM cte_punch_ins
UNION ALL
SELECT *
FROM cte_punch_outs
ORDER BY trans_date, timein, timeout
1 Like

@Ifor
I appreciate your effort.

@ ScottPletcher

Yes!!

Giving correct result for the sample data as well as for all real data in my actual table.

Just had to change in order by caluse

order by trans_date , coalesce(timein,timeout).

Many many thanks for your help.

Great to hear, thanks for the feedback!