Just to make sure and keep the readers aware that I am not a DBA but learning and working as I face the issue.
Scenario:
Have a 2TB db where there is a heavy load of activity during the 1st and 2nd week of the month. Also, to add – the data keeps adding starting from the first month like first month will do a ETL with 4 M records and with next month there will be another 4 M added. This trend continues for the year and every month processing increases as they process data from the starting of the month. Also, there is a history of data which is preserved on the source and transactional / reporting tables. One of the biggest object which is existing has a size of 250 GB ( as of now – 400 M Records ) . As the process is random – predicting the transactional log size seems to vary based on the different periods of time. Determined the log size – not like fixed approach as I didn’t see any place that gives a formula to calculate on what would be the limit of transactional log that needs to be set . So , based on common analysis and observation – kept the transactional log to be as double the size of the biggest object which exists – so in our case it was 500 GB initially ( will be revisited on frequent intervals) . It works fine most of the time in the year even though there is a fluctuation on the intensity of how much transactions happens – but during the end of the year or starting of the next year when huge amount of data gets processed – this limit is not enough.
Process fails out due to no space of logs – as it gets filled up within 2 to 3 days in the starting of the week as it crosses and we were forced to add the space for the log as the user needs to have the db highly available and on strict timelines ( Note - differential backup is in place and complete one happens in the weekend ) .
Questions –
If we increase the log size then it’s actually set and getting it back is an issue – so we end up spending more log space ( For ex to handle the case if I increase it 1 TB) which is not used for most of the time in the year – how to be more conservative or what steps needs can be taken to control this
Will the below work or what else do we need
- Working on long running transactions – trying to avoid transactions if its not required ( Insert / Delete or Update)
- Working on removing un-wanted indexes on big data objects
- CDC is activated on the db – checking on ways to reduce the utilization
- Data archival of big objects
Question on Shrink Log – there are different opinions on applying shrink log – what’s the impact of adding the shrink log in the ETL rather than when required ( issues happen) to truncate logs .
Are there different approaches to check on this from your experience . Thanks