so, I was not sure the correct category to post this in but hope this is it!
Imagine the following:
A SQL Server instance that has quite a few DB's. It DOES get hammered by many different SQL Agent jobs OR by external applications that are all around a company or across teams.
Given the high usage, is there a way to take a before and after snapshot of the DB's when a particular job runs? (The ones I have control over at least!)
Since I am exploring this black box, I want a way of knowing the state of the data before the job runs and after and see the delta.
Other than taking a backup before the particular job runs not sure of another way. I am worried though about this statement "exploring this black box"
Why is a black box or is that part of the forensic work you are doing since you are trying to find what is causing a certain thing you are investigating?
Basically what is your end game, why are you doing this?
Likely easiest would be to use database snapshots, if you know the table(s) you need to compare.
(1) Snapshot the db;
(2) let the job run;
(3) snapshot the db again;
(4) compare data between the two(+) snapshots.
(!!) Be sure to drop the snapshots as soon as possible because they take up space (if/as data is modified) and thus have some overhead.
Because of project discovery. Since there is no real documentation and not knowing what tables/DB's the jobs run or expected outcomes after a job, I am trying to untangle it and understand the black box - that's why. Think 10+ year old system.
A snapshot uses zero space until/unless a page changes (which is why creating a snapshot is almost instantaneous). (Only) When a page changes, the original page gets copied into the snapshot area and uses space for that page(s).
Thanks! I guess I need to look more into this but ultimately I want to do a snapshot and then review it whenever I "get to it" (could be hours or days later) and maybe replicate/restore it on a lower environment
Thanks Scott. Yeah this is the problem where there is pretty much a prod environment and no lower environments and due to other processing (dependencies), it's difficult to replicate the jobs in lower environments (jobs = external from SQL but uses SQL).
It's why I thought maybe in PROD, if we can take a delta (before and after) and then take that delta "offline" and restore to a lower environment to see the data differences, that would help big time and nail is down to which job does what (as multiple jobs could overlap or run at the same time)
A diff backup would allow you to do that too. Make a diff backup; let the job run; make a separate diff backup.
Restore both diff backups separately to different db names and then do the comparisons you want.
You might also try CDC, but the internal system procs you're forced to use by default are a royal pita. I ended up writing my own versions to make pulling the changes vastly easier and far, far less buggy.
Excellent question.
Pretty simple - to know what tables are being touched and what the data changes would be. This will help with analysis or expectations to build a roadmap and have more of a clearer understanding of the process unlike right now
Sure, I would do that but thought there would be more of a better/efficient way and more "to the point" of finding the data changes as there are MANY DB's and MANY tables.
Im working on that. But even so, the setup process is going to be big and to replicate prod behavior even harder due to external out of company dependencies.