Log_backup & growing log file

Grifter · August 2, 2021, 2:16pm

Hi

In one of our DBs there is an audit job that runs every day and fills the log file really quickly. I'm not a DBA and we don't have one, so I'm looking at how to properly set up whatever it is i need to set up to make sure this doesn't fill up and crash processes.

I've done quite a lof reading up on all the ins and outs of log files, and really am not any wiser on what it is I need to do exactly.

I had a look using this query:

SELECT name
	, state_desc
	, recovery_model_desc
	, log_reuse_wait_desc
FROM sys.databases

And the DB in question had "LOG_BACKUP" under the column "log_reuse_wait_desc". I'd read in quite a few places that taking a log back up would "flush" or "clear" the log, the last log back up was in May so I took another one, then re-checked the query above, which now said "NOTHING" rather than "LOG_BACKUP".

THe olg had also went from 97% to 1.6%. Proceeded to re-run the job for the audit trail, and the log size % went back to 63% and the "LOG_BACKUP" under the column "log_reuse_wait_desc" reappeared.

From this point I'm not sure what is the next step, how do I stop this growing so quickly? I'd read about setting up a regular log back up but then read it will fill your disks up quickly.

Not really sure where to go from here?

Thanks for any suggestions or advice.

G

mike01 · August 2, 2021, 2:39pm

a few questions here:

what is the audit job doing that is causing the log to fillup?
is this a production box?
What kind of volume are you running through it?
Are there regular backups being done? If so, how often?

Grifter · August 2, 2021, 3:10pm

what is the audit job doing that is causing the log to fillup?
It's taking audit information from an audit view and inserting it it to specific audit table, this is the slowest bit
is this a production box?
No
What kind of volume are you running through it?
The main section of the job processes 1.5 Million rows
Are there regular backups being done? If so, how often?
Back up of the DB, yes, log files? No.

RogierPronk · August 2, 2021, 6:07pm

I think you will find this video really usefull:

jeffw8713 · August 2, 2021, 6:58pm

If this isn't a production system - what are the recovery requirements? If you don't need to recovery this system to a point in time - then you could change the recovery model of the database to simple. This would negate the need to backup the transaction log.

If you do need to be able to recover to a point in time - then you absolutely must schedule regular transaction log backups (regardless of disk space usage for those backup files). These should be scheduled to enable recovery to the necessary point in time - in other words, how much data can you afford to lose?

If you do change to simple recovery - the amount of transactions, and the size of the transaction log file still needs to be reviewed. If the process takes 63% of the current transaction log - then that file will need to remain that size.

With that said - you could modify the process to 'batch' the inserts - which will reduce the amount of the transaction log that will be used for each 'batch'. In between batches you can then perform a checkpoint (simple recovery) or a transaction log backup (full/bulk-logged recovery). Batching the inserts may also improve the performance of the overall process.

Grifter · August 3, 2021, 8:00am

This DB is basically a snapshot of live, taken every day, so there is no real requirement to back it up. I've confirmed this by checking the back up files and there aren't any.

There is also the issue where the back up from live changes the DB back to a full recovery model as well. Set it tosimple last night and it's back to full today.

If I set to simple will I need to first take a trans log back up to clear the LOG_BACKUP flag in the sys.databases table?

jeffw8713 · August 3, 2021, 3:53pm

So now I am confused - why would you be running processes that build tables in a database that is going to be refreshed? Any data built from that process will be lost...it doesn't make sense.

If this is truly a snapshot of production, it should be restored and set to read-only. Any processes that need to capture data from that should be maintained in a separate reporting database - which can then be set to simple recovery since there is no requirement to be able to recover to a point in time. That reporting database can be rebuilt from the source database(s).

If you really must create data in this database every day - then add a step to the restore process to switch the database to simple after it has been restored. And no - you don't need to run a transaction log before switching to simple recovery.

yosiasz · August 3, 2021, 4:32pm

what is the ultimate purpose/use of this restored database? You mentioned something about audit? What kind of audit and what else is this restore used for?

Grifter · August 4, 2021, 12:00pm

@jeffw8713 First of all this whole process was built way before my time here, so I can't explain, nor do I know the reasoning behind the way it is set up.

From looking at it there is a lot of daily audit information kept that needs to be processed and used by various reports, those reports point to the backed up DB, I think it is done on the shadow copy, so that the agent job does not slow down production when it's run.

Again, I can't vouch for the overall design of what is in place and we would need to spend a lot of time re-designing it to make the processing more efficient.

Grifter · August 4, 2021, 12:06pm

@yosiasz - It seems to be mainly for processing of reporting data, so it's kept off of live, as many of the jobs take quite a lot of resources when they're running, as well as the audit processing, and reporting for that. As I mentioned to Jeff I can't provide reasons why it was designed this way as it was set up like this before I worked here.

yosiasz · August 4, 2021, 1:35pm

Gotcha. So jobs are also run on this restored database? And what do those do? How big is the backup size in GB, how many tables? What portion of it is audited, 15%, 35%?
What portion of it is used for reporting?
Do you need all of the objects for the above? Tables, procedures, views, indices, functions etc.

I hear you that it predates you but to me it seems a bit of an overkill imo not fully understanding the audit part and what is specifically does. I am leaning more towards an always on, or cdc solution or maybe even a data warehouse for the reporting part. But thats not here nor there.

But the current approach is time and resource consuming and it wont scale in the near future. Change it if you have the power to do so

jeffw8713 · August 4, 2021, 2:29pm

The problem isn't that the process is set up to backup/restore a copy of the database. The problem is that you are using that same database to store data after processing. Since you know that database will be refreshed tomorrow - what good does it do to create tables in that database?

A better approach is to create a new database on that instance - modify the processes so they create the data in this other database instead of creating the objects in the restored database.

Once you do that - you can modify the backup/restore process so that restored database is set to read-only after it has been restored. That will prevent any changes to the source database and allow you to control the other database. This also allows you to build your process so the tables don't have to be dropped and recreated each time.