Transactional Log Growth of SQL Server – How to Predict?

rathinamb · January 14, 2024, 7:11pm

Just to make sure and keep the readers aware that I am not a DBA but learning and working as I face the issue.
Scenario:
Have a 2TB db where there is a heavy load of activity during the 1st and 2nd week of the month. Also, to add – the data keeps adding starting from the first month like first month will do a ETL with 4 M records and with next month there will be another 4 M added. This trend continues for the year and every month processing increases as they process data from the starting of the month. Also, there is a history of data which is preserved on the source and transactional / reporting tables. One of the biggest object which is existing has a size of 250 GB ( as of now – 400 M Records ) . As the process is random – predicting the transactional log size seems to vary based on the different periods of time. Determined the log size – not like fixed approach as I didn’t see any place that gives a formula to calculate on what would be the limit of transactional log that needs to be set . So , based on common analysis and observation – kept the transactional log to be as double the size of the biggest object which exists – so in our case it was 500 GB initially ( will be revisited on frequent intervals) . It works fine most of the time in the year even though there is a fluctuation on the intensity of how much transactions happens – but during the end of the year or starting of the next year when huge amount of data gets processed – this limit is not enough.
Process fails out due to no space of logs – as it gets filled up within 2 to 3 days in the starting of the week as it crosses and we were forced to add the space for the log as the user needs to have the db highly available and on strict timelines ( Note - differential backup is in place and complete one happens in the weekend ) .
Questions –
If we increase the log size then it’s actually set and getting it back is an issue – so we end up spending more log space ( For ex to handle the case if I increase it 1 TB) which is not used for most of the time in the year – how to be more conservative or what steps needs can be taken to control this
Will the below work or what else do we need

Working on long running transactions – trying to avoid transactions if its not required ( Insert / Delete or Update)
Working on removing un-wanted indexes on big data objects
CDC is activated on the db – checking on ways to reduce the utilization
Data archival of big objects
Question on Shrink Log – there are different opinions on applying shrink log – what’s the impact of adding the shrink log in the ETL rather than when required ( issues happen) to truncate logs .
Are there different approaches to check on this from your experience . Thanks

ScottPletcher · January 15, 2024, 6:11am

It's very difficult to predict log growth accurately. For the yearly processing, it will be much, much more efficient if you increase the log space rather than letting it autogrow.

CDC could indeed make the log bigger: SQL can't truncate (re-use) the log until the CDC task has processed all log records.

You should definitely not routinely shrink the log.

RogierPronk · January 15, 2024, 5:06pm

Maybe it's possible to do the ETL and the calculations in a created database and transfer only the results to the 2TB database. After the load you can drop the created database.

Another approach is to change the recovery model at the begin and change it back after the load as explained here:

Prerequisites for minimal logging in bulk import - SQL Server | Microsoft Learn

and

Set database recovery model - SQL Server | Microsoft Learn

but please be careful as these info require a lot of knowledge of the pros and cons and maybe it's a bad advice in your specific situation.