Dynamic sql for taking distinct data into temp table then reload into original table

Grifter · May 9, 2017, 2:28pm

What I am trying to do

select distinct records from a main table
Put those records into a temp table
Clear the records from the main table
then select the distinct data from the temp back into the main table

I am already checking the main tables and populating a temp table (temp 1) with a list of tables that have duplicate data.

Then by looping through each row in that temp 1 table, using a cursor, to see what tables are duplicate via count value of duplicates, I wanted to use the actual table name, which is loaded into a variable then into temp 1 table, to then select the distinct records from this table into another temp table 'table 2' lets say.

Then I want to clear the main table and reload the distinct data from temp 2 back into the main table.

This is the statement before cursor that loads the temp 1 table with the table name that has duplicates

INSERT INTO #TEMP_TBL_DUP_CNT
select cast(@rec_cnt as varchar(10)) reccnt, cast(@tbl_name as varchar(200)) 
tblnm

So this is my cursor below and i have numbered my ideas below 1-4 as related to bullet list above. The syntax is the bit I am struggling with and would like help as to how to create it.

DECLARE REC_CURSOR CURSOR FOR
-- Set select stamtent for cursor

select reccnt, tblnm
from #TEMP_TBL_DUP_CNT

OPEN REC_CURSOR
FETCH NEXT FROM REC_CURSOR INTO @rec_cnt, @tbl_name

WHILE @@FETCH_STATUS = 0

BEGIN 

-- Check if record count is 0
-- if is larger than 0 then select distinct records from tbl name into
-- temp table

if @rec_cnt > 0

-- Need to load data into new temp table using dynamic SQL and @tbl_name 
-- Ideally wanted to create a temp Table with dynamic name like 
-- '#TEMP_' + @tbl_name giving #temp_Table_Name_1, #temp_Table_Name_2 etc.
-- 

 1) Starting with something like :

 -- SET @tbl_temp = '#TEMP_' + @tbl_name

 2) Then select distinct * into temp like this:

 -- SELECT 'distinct * INTO ' + @tbl_temp + ' FROM ' + @tbl_name

 3) Then clear main table

-- TRUNCATE TABLE + '@tbl_name'

 4) Then select from temp back into main table

-- SELECT '* INTO ' + @tbl_name + ' FROM ' + @tbl_temp

FETCH NEXT FROM REC_CURSOR INTO @rec_cnt, @tbl_name
END

CLOSE REC_CURSOR
DEALLOCATE REC_CURSOR

It doesn't have to be a cursor but this is what I decided on to build. If anyone has any other suggestion then please feel free to suggest.

Thanks for any input.

G

Kristen · May 9, 2017, 3:04pm

You could just delete the duplicates instead (keeping the "preferred" record, whichever that is [assuming that they are not identical duplicates, if they are then just keep "first one"])

I use ROW_NUMBER() OVER() for that job, with ORDER BY so that preferred record is first, and then delete all rows WHERE Row Number >= 2

Grifter · May 10, 2017, 11:13am

I'm trying to do this a simpler way now using temp tables. I am having an awful time with these temp tables. It keeps saying you can't drop this table as either it doesn't exist or you do not have permisisons. Other times I get this:

Database name 'Temp Name' ignored, referencing object in tempdb.

This is the basic statement for each table

 SELECT column1
,column2
,column3]
INTO myDB.[dbo].[##TEMP_TABLE]
FROM myDB.[dbo].[Tablename1]

TRUNCATE TABLE myDB.[dbo].[Tablename1]

INSERT INTO myDB.[dbo].[Tablename1]
SELECT column1
,column2
,column3]
FROM myDB.[dbo].[##TEMP_TABLE]

Can I create temp table in the DB I am sourcing data from or is it better in temp dob?

If it is how do I create it, do I just use #TEMP or ##TEMP and it will know where to create temp table

Also do I have to dro pthe temp table after every statement in the stored proc, and if so what is the syntax? I've used just drop table #tempname, or if statement at start and just drop between each statement.

Seem to be getting errors with every way I try.

When I want to create a temp table based on the table I am querying do I use select into or insert into select * from?

Please can someone help with syntax/set up and dropping of these temp tables?

Kristen · May 10, 2017, 3:18pm

What you descrbied earlier is a sledge hammer to crack a nut.

You want to take all the DUP rows out into #TEMP, delete them from Source, and then put one back.

Better would be to delete-all-bar-the-one-to-keep, as I described.

If you delete the row that you want to keep (along with others), and then put it back there may be all sorts of side effects. Maybe you won't encounter them, but if you do they will be a PITA.

Off the top of my head:

Triggers
IDENTITY numbers (different when you insert back again)
Foreign Keys
Create Date/Time
Audit/Archiving
and so on.

No, its not database specific (but in the case of #TEMP, rather than ##TEMP, its unique to your current session, so "INTO #TEMP" is probably all you need to do).

but my advice is DO NOT DO IT THAT WAY ...

jeffw8713 · May 12, 2017, 6:06pm

I agree with Kristen - you are really making this much more complex than is needed. And if you are trying to build a process that can be run in the future...I would recommend not taking that approach.

The one thing I haven't heard is that you have a plan in place to prevent the duplicates from occurring in the first place. The first order of business is to identify why you are getting duplicate values and have a plan in place to apply those changes once you have removed the duplicate rows...

This just sounds like you are building a process to 'clean' up the database after the fact...which is always a bad idea.