There is a requirement to capture the ids in a table for which any of the attributes associated with that id was changed. say for eg There is a table called Employee and there are 3 columns Employee Id, First Name, LastName and the application changes some of the first Names and the last name throughout the day. At the end of the day we need to capture all the ids for which the Name fields were changed . Its a very busy environment so we have to implement a solution that has the least overhead . We do not want to capture inserts, just updates and deletes. A lot of inserts happens to the table , so locking the table is not ideal as the application might time out. we have looked at implementing Change Tracking,cdc and Triggers and they all have an overhead. Could you please suggest a method that would have the least overhead . Thanks
What version of SQL Server?
Version is SQL2008R2
We have Triggers on almost every table in all the systems we build, every Update re-updates the row itself with a change-date and also every Update/Delete also inserts a row (the "before data") into an Audit table. One audit table per actual table.
We have shopping cart sites where the general public "provide data" - its AN AWFUL LOT OF DATA
Even just moving around the pages on the site they provide data - every single Stored Procedure that they cause to run [by visiting a page] inserts a row in a Logging table, each page view probably triggers 20 - 50 procedures ... they all LOG their parameters etc. ... and then each procedure updates the log row when the procedure exits. It is an AWFUL LOT!! of logging and audit data. The overnight process that deletes stale data takes hours to run ...
... we have never noticed any delay, so I'm surprised that you are expecting to have a problem with that sort of approach.
You don't say if you need to be able to see all changes made to a single row, during the day, or if you just want to know "if it has changed" (and NOT know what the previous data was)
If you only need "has it changed today" I would have a ChangeDateTime column in the row, which is only set if one of the "important" attribute columns changes. You could do that in a trigger (which would catch the change however it was made), or you could do it in the procedure that updates that row (that would be "more efficient" because the row will not be processed once, but anything that updates the table direct would miss that "catcher").
That won't work for Deletes, so to use that method on Deletes you would have to NOT delete the rows, but to FLAG them as deleted instead. Maybe have a process that physically deletes flagged rows after some time (i.e. to allow the "What changed today" report to be run). All existing processes would have to be changed to exclude rows-flagged-as-deleted. Unless you have that already its quite a lot of work to make that change (**)
But personally I would use a Trigger to store the BEFORE data (in just the important Attribute columns, if you like) when the row Updates/Deletes (and optionally, for the Update, ONLY if the data in those important Attribute columns changes)
(**) One way to do that might be to rename the table (e.g. as MyTable_V2) and create a VIEW with the original MyTable name, and have the view SELECT All Columns but with a WHERE clause to exclude records fagged as Deleted.
My expectation is that the clever-trick here will be the way in which old, stale, data is deleted - so that that task does not lock the system, kill the TLogs, and so on. Our Stale Data Delete procedure took a lot of [elapsed] time to write, and perfect. It deletes in batches, backs-off if it detects that the system is busy, figures out if it is filling the TLog and if so triggers additional TLog backups, and so on. We run ours during the night (its a 24/7 system, but nighttime loading is much lighter) but I would have no worry running it during the day because it would automatically run at a slower pace, using smaller batches, when it detected the activity.
Rebuilding indexes and updating statistics may also need some consideration [i.e. to ensure that everything is as efficient as possible]
Maybe @otorre_riversidedpss.org has an alternative solution though, based on newer versions of SQL server - I'm looking forward to hearing that
Many Thanks for your reply. Much appreciated. We want to know if a certain data has changed. We do not want to capture the changes rather just check if the data has changed. We did propose to have a time stamp column added to the table but apparently the front end .Net is so tightly coded that any change in the schema would break the app. We probably will implement triggers. Enabling service broker might help in this scenario as the trigger implementation will then be asynchronous and will cause less overhead but have not worked with service brokers before, so not sure. Any ideas on service broker?
Two ways that spring to mind that you could tackle that :
the VIEW trick above - i.e. Rename the table to MyTableV2, add the new column(s) to that table and create a VIEW with the original table name and original columns only. Would just need to check that Performance is OK, but I would expect a straight VIEW with just a subset of the columns in the actual table (i.e. no additional tables JOINed and no columns "manipulated" or "reformatted") would have identical performance to "the real table"
A second table, with a TRIGGER [on the original table] than stores the PKey and UpdateDate (only) in that second table (on UPDATE / DELETE).
Personally I wouldn't abstract it further, using Service Broker or similar, unless the associated task was "complex" / "time consuming". If you wanted to send an Email, from within the Trigger, then yeah - I would never do that IN the trigger, but I would just use the trigger to "log" that an Email was required (e.g. to Service Broker, or just chuck the PKey in an EmailPending (queue) table).
My first line of attack would be KISS - i.e. not to add complexity unless that was the only solution.
SSiS for 2008R2 ships with CDC support. Be careful with new technologies, that is, build it in dev and get it working with a simulated production load and see what happens. CDC can potentially fill up your log space if misconfigured. I have no CDC implementations in production. Trigger audit trail solutions are still what we have in production for the most part. 2016 offers "Temporal Tables":
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.History));
I have just one database I'm using it with but it's pretty cool supporting
FOR SYSTEM_TIME clauses like
AS OF. It makes it pretty easy to audit what you want.
That's all we have!. Still!. It works well but for us reference to Audit Data is very rare, so that may not suit other use-cases. For us it is basically "How the hell did that happen" (Operator Error, Software Bug, Fraud) and occasionally "What's the rate/frequency of change on X?"
In case of interest: we purge our Audit tables along the lines of "Delete all rows older than XXX months, but be sure to retain one row for each PKey" - the idea is that users can review the audit table for any change in the last XXX months, OR see that there is ONE row older than that, and thus know that the row has not changed SINCE then.
I've put no effort into even considering the newer methods, should I be doing so do you think?
Thank you all for your valuable suggestions. We have decided to go with the SQL TimeStamp column,which doesn't need triggers etc. However, I am still not sure how we capture deletes
If this is a "Once a day" report (or "Changes since LAST_TIME") then you could copy the PKeys to another table (when running the report) and use those as the base-line "next time". That said, that would only get you the PKey of the deleted row.
If deletes are not common? you could consider a Trigger just to "log" DateTime of the delete and any columns you want to report on (to a DeletedRows table)
Or compare against a restored copy of the database (as of "yesterday"). Doubt that's a useful suggestion?!! unless you do a restore as part of testing that your backups are Known Good.