-- BEGIN TRANSACTION
;WITH T_CTE
AS
(
SELECT [T_RowNumber] = ROW_NUMBER()
OVER (
PARTITION BY DupCol1, DupCol2, ...
ORDER BY DupCol1, DupCol2, ..., TieBreakCol
)
FROM #TEMP
WHERE ... any restriction on which rows to compare ...
)
DELETE D
FROM T_CTE AS D
WHERE T_RowNumber >= 2
--
-- COMMIT
-- ROLLBACK
In the "DupCol1, DupCol2" list include all columns which must match to be considered a duplicate
"TieBreakCol" should be a column used as a tie-break to decide which row to KEEP. For example a ModifyDate or an ID number. Note that the FIRST row, i.e. sorted LOW, will be retained, so use DESC if you need the HIGHEST / LATEST value retained. If the tie-break column is NOT unique then the row chosen will be at random, so you may also want to include a unique key, e.g. the primary key(s), so that the process is repeatable
If the number of rows deleted is very large (more than 10,000 say) then you may need to do this in batches, particularly if the delete operation will be performed whilst the database is in use.
If the rows have a unique identifier (Primary Key would be a good candidate, if you have one, most especially if it is also the Clustered Index) then use the above code to just collect the key(s) for the records to be deleted into a #TEMP table, and then use that to physically remove the rows, probably something like the code below.
This code sample has configurable options for limits on number of deletes per iteration, max iterations, max time of any single iteration, and max overall runtime (e.g. abort after maintenance window expires). It must NOT be run inside a transaction. It includes a SLEEP (default 3 seconds) each iteration, which will stop it hogging resources and allow other process to run.
(If the single-iteration time causes an abort you might adjust the code so that the Batch Size is adjusted, downwards, after a slow iteration and back upwards, again, after a fast iteration, it will then auto-adjust to varying load on the server. In practice I have found that some delete iterations are slower than others (secondary index cleanup etc., checking for foreign key rows and so on)
;WITH T_CTE
AS
(
SELECT [T_RowNumber] = ROW_NUMBER()
OVER (
PARTITION BY DupCol1, DupCol2, ...
ORDER BY DupCol1, DupCol2, ..., TieBreakCol
)
FROM #TEMP
WHERE ... any restriction on which rows to compare ...
)
SELECT [T_ID] = IDENTITY(int, 1, 1),
KeyCol1, KeyCol2, ...
INTO #TEMP
FROM T_CTE AS D
WHERE T_RowNumber >= 2
ORDER BY KeyCol1, KeyCol2, ... -- Works best if sorted by Clustered Index keys
--
-- NOTE: This delete section must NOT run inside a transaction!!
-- This section can be aborted (any deletes completed will not be rolled back)
-- This section has configurable options for Max Batches and Max Iteration time limits
-- and Max elapsed runtime (e.g. so that it completes within a maintenance window)
--
--
-- NOTE: #TEMP table must be pre-populated with the Key Columns of rows to be deleted
-- Replace KeyCol1, KeyCol2 etc. below with the actual Key Column names
-- Replace MyTable below with the name of the target table
DECLARE @intErrNo int,
@intRowCount int,
@intLoopCount int,
@intLoopCountLimit int,
@intTotalRowCount int,
@intElapsed int,
@intIterationElapsedLimit int,
@intTotalElapsedLimit int,
@intBatchSize int,
@intOffset int,
@intExitReason int,
-- @intExitReason:
-- 0=Success,
-- 1=Error (see @intErrNo),
-- 2=Max time exceeded,
-- 3=Max iteration time exceeded,
-- 4=Loops limit exceeded
@dtStart datetime,
@dtIterationStart datetime
SELECT @intBatchSize = 10000, -- Rows to delete (per iteration)
@intIterationElapsedLimit = 15000, -- Abort if a single delete exceeds this (ms) - NULL=Infinity
@intTotalElapsedLimit = 60, -- Abort if still running after this amount of time (minutes) - NULL=Infinity
@intLoopCountLimit = 200 -- Maximum number of loops - NULL=Infinity
SELECT @intErrNo = 0, -- Reset
@intRowCount = 1, -- Force first iteration
@intLoopCount = 1,
@intTotalRowCount = 0,
@intElapsed = 0,
@intOffset = 1,
@dtStart = GetDate() -- Start time for the whole batch
WHILE @intRowCount >= 1
BEGIN
SELECT @dtIterationStart = GetDate() -- Start time for THIS iteration
DELETE D
FROM #TEMP AS T
JOIN dbo.MyTable AS D
ON D.KeyCol1 = T.KeyCol1
AND D.KeyCol2 = T.KeyCol2
...
WHERE T.T_ID >= @intOffset
AND T.T_ID < @intOffset + @intBatchSize
SELECT @intErrNo = @@ERROR, @intRowCount = @@ROWCOUNT
SELECT @intLoopCount = @intLoopCount + 1,
@intTotalRowCount = @intTotalRowCount + @intRowCount,
@intOffset = @intOffset + @intBatchSize,
@intElapsed = DATEDIFF(Millisecond, @dtIterationStart, GetDate())
IF @intErrNo <> 0
BEGIN
SELECT @intRowCount=0, -- Abort if ERROR
@intExitReason = 1
END
IF @intTotalElapsedLimit IS NOT NULL
AND @intTotalElapsedLimit < DateDiff(Minute, @dtStart, GetDate())
BEGIN
SELECT @intRowCount=0, -- Abort - total elapsed time limit exceeded
@intExitReason = 2
END
IF @intIterationElapsedLimit IS NOT NULL
AND @intElapsed > @intIterationElapsedLimit
BEGIN
SELECT @intRowCount=0, -- Abort - single-iteration time limit exceeded
@intExitReason = 3
END
IF @intLoopCountLimit IS NOT NULL
AND @intLoopCount > @intLoopCountLimit
BEGIN
SELECT @intRowCount=0, -- Abort - Loop-count limit exceeded
@intExitReason = 4
END
/** ENABLE FOR DEBUGGING *
SELECT [@intLoopCount]=@intLoopCount
, GetDate()
, [Elapsed] = @intElapsed
, [@intErrNo]=@intErrNo
, [@intRowCount]=@intRowCount
, [@intTotalRowCount]=@intTotalRowCount
, [@intOffset]=@intOffset
RAISERROR (N'Waitfor', 10, 1) WITH NOWAIT -- Force the debug message to display
* ENABLE FOR DEBUGGING **/
WAITFOR DELAY '00:00:03' -- SLEEP to allow other processes to run
END -- @intRowCount >= 1
IF @intErrNo <> 0
BEGIN
... handle the error ...
END
DROP TABLE #TEMP
Hi, you can easily delete duplicate rows from a large table by using temp table. Here the idea is to copy exclusive rows into a temp table. You can easily find matchless rows by utilizing distinct clause. The time when these unique rows are copied, remove everything from the original table and then copy unique rows again, and by doing that all the identical rows have been deleted. To do it just go for the following instructions: removing duplicate using copy, delete and copy select distinct name into #unique from #programming delete from #programming; insert into #programming select * from #unique -- check after select * from #programming name Java C++ JavaScript Python. Find out more about removing duplicate tables from the rows from SQL server table by following https://appuals.com/how-to-remove-duplicate-rows-from-a-sql-server-table/ .