Using SQL Svr 2008 R2, I need to clean up duplicated records, as well as set a valid_to date.
My table has the following 5 columns:
- id (identifying record id, however some numbers are skipped)
- doc_id (this is the document id = invoice number)
- qty (this is the quantity of sold products included in this doc_id)
- valid_from (this is the creation or modification date of the doc_id; the qty can change later on)
- valid_to (is initially null.. but should indicate the end of life date of this doc_id's version)
Every day, I'm inserting new records to the table, containing all new doc_ids, as well as any change made on existing doc_ids. It can also contain duplicates of already existing doc_ids without any change.
Example data (the real table contains about 3 mio rows):
.id..|.doc.|qty|.valid_from.|.valid_to. 1001 | 520 | 5 | 2015-12-01 | null 1005 | 520 | 6 | 2015-12-03 | null 1012 | 521 | 2 | 2015-12-03 | null 1018 | 520 | 6 | 2015-12-07 | null 1022 | 521 | 8 | 2015-12-08 | null
I'm looking for a good performing UPDATE query, that
a) removes unnecessary duplicates, meaning no change in qty
b) adds the valid_to which is always one day before the valid_from of the next version for a certain doc_id. Or it remains null if the current line is the most latest version of the doc_id
Do you have any suggestion?
If it makes sense, I can also add an index to some columns. And could also change the null value in valid_from to a certain date (for example 2300-01-01), as null is not indexable.
After the query executed, above table shoud look like this:
.id..|.doc.|qty|.valid_from.|.valid_to. 1001 | 520 | 5 | 2015-12-01 | 2015-12-02 1005 | 520 | 6 | 2015-12-03 | null 1012 | 521 | 2 | 2015-12-03 | 2015-12-07 1022 | 521 | 8 | 2015-12-08 | null
record id 1018 was removed because it was a duplicate.
I appreciate any help!