SQLTeam.com | Weblogs | Forums

SQL job to delete records older than 4 years


#1

Hello I did a Google search on this and found such a mixed bag of answers that I have no comfort around it.

This isn't something I have to get done it is more of a learning exercise. I have a strong NOC background and was recently recruited into a company for an all encompassing IT role.
My SQL skills are limited to a few queries I do for NOC troubleshooting so my overall SQL development skills are very weak. One of the developers posed a question to me because he wants me to "improve my SQL skills" it's not a mandatory work task. It's just something that if I get done, would really improve my ability with SQL.

The problem we are trying to solve is we want to delete records older than 4 years from a particular schema automatically without knowing the table names ahead of time. Basically if the table is in a particular schema a SQL server job will run a script that deletes records older than 4 year. Job will run once a day or once a week or whatever. The SQL script is what is important now I can easily figure out the job schedule.

Can you guys help me figure out how this would work? I was thinking the only way to do this would be dynamic sql (which I know nothing about, just that it might work) or maybe something like
select * from INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE LIKE '%date%'
DELETE FROM INFORMATION_SCHEMA.TABLES WHERE DATE < DATEADD(yy,-4,GETUTCDATE())

But I am complete SQL dev NOOB so not sure what I am getting.


#2

This sounds dangerous...
Let me summarize the requirements as I understand them:
Given a Schema name, find all tables within that schema, find all records on those tables that are older than four years and delete them.
If that is correct then we can proceed:

  1. Yes, you'll need dynamic SQL since you don't know, a priori, the names of the tables. Dynamic SQL is nothing more than using a string (varchar or nvarchar) variable in which we build the SQL statement to execute.
  2. You can get the set of table names by querying INFORMATION_SCHEMA.TABLES and filtering on the given Schema name
  3. You can get the set of date related columns on those tables by querying INFORMATION_SCHEMA.COLUMNS and filtering by Schema, Table and Data_type.

HOWEVER

You don't know what these date columns represent! They could be the record creation date which would allow you to decide if the record was four years old or not. But they could also be some other type of date, for instance DOB, in which case you'd be deleting everyone who was older than four. You need additional criteria before going any further.

  1. Once you have resolved #3 it should be possible to dynamically create the proper DELETE statements and execute them.

#3

Yes I agree it sounds dangerous. I am told the safe to delete stuff will be sorted into it's own Schema that we can sort on but I am a noob so I am not sure if that answer closes all the major concerns. I don't think they were expecting me to know the answer to this, I think they were just interested in what my answer would be.

I raised some of the same concerns you raised already. This is very low on my priorities at the office and is more like an extra credit project as my primary duties are still Network, VM and End user related. If anyone has some code samples they can walk me through and really dumb it down for me that would be appreciated. Otherwise I will provided updates as I can. This script if implemented will be used for a server we aren't putting into production for quite some time yet and the database is going to be gathering data from a lot of sources to assist traders with making decsions. Deleting records is needed to prevent data overflow. It will be months before it is actually needed and the guy who told me to do it, it's his job to actually do it and I think he already has the answer. He is just one of those kind of guys who likes to "push" his colleagues through asking tough questions so I think he wants me to understand how this would work even though it is ultimately his job to create it.


#4

I agree that you'll definitely need some human control over this activity. Deleting rows is very serious stuff.

I suggest creating a new table that contains the tables to delete from and the control column(s) and conditions for delete (or other cleanup action). Something like below. It will then be relatively easy to use those tables to dynamically generate and execute SQL code to do the cleanups.

For example:

CREATE TABLE <schema>.cleanup_tables
    table_id int IDENTITY(1, 1) NOT NULL,
    table_name nvarchar(60) NOT NULL,
    table_type char(1) NOT NULL DEFAULT 'T' --'A'=archive table;'T'=normal (data) table.
    is_delete_rows bit. --for now will always be 1, as delete is the only cleanup action currently in place.
    is_archive_rows bit. --default to 0; if 1, rows are copied to archive/history table before being deleted.
    archive_table_id --default to 0; must be > 0 if "is_archive_rows = 1".
    min_age_in_months_to_trigger_cleanup smallint --48=4 years. will provide default value for same column in cleanup_columns unless overridden.
    CONSTRAINT cleanup_tables__PK PRIMARY KEY ( table_name ) WITH ( FILLFACTOR = 100 ) ON [PRIMARY]


CREATE TABLE <schema>.cleanup_columns
    table_id int NOT NULL --FK to cleanup_tables
    column_name nvarchar(100) NOT NULL
    is_delete_rows bit. --for now will always be 1, as delete is the only cleanup action currently in place.
    min_age_in_months_to_trigger_cleanup smallint --48=4 years
    connector_to_other_columns char(3) --''|'AND'|'OR' only.