MSDB sysJobStepsLogs / sp_delete_jobStepLog housekeeping

We've got some logging in MSDB sysJobStepsLogs

I suspect at some point we found that SQL Agent Jobs logging was set to some very small (default) number, and we had a job running at short interval - every few minutes - and we were losing job logging history too soon to be any use to us when retrospectively investigating a job.

Spent the day tidying up a run-away MSDB database. There's the old culprits of Logging of Backups and clearing that out using sp_delete_backupHistory (the issue is that there are NO indexes which means that clearing the Backup Log tables is catastrophically slow; that is fixed in SQL2014 - that's 7 years after people started asking MS for a solution - DBAs all around the world wasting time solving a problem that MS should have fixed promptly)

Then I found that sp_purge_jobHistory wasn't scheduled either. At least that one runs reasonably quickly. But for both of them I had to write a loop that only cleared one month's data at a time (for us that's about 25,000 rows per loop iteration, so not exactly trivial)

NOTE: Don't attempt to clear one month's log backup history from MSDB if you have a large number of databases and/or frequent log backups - it will take days. Make sure you have found suitable indexes to create first. I don't recommend you try, for example, clearing one day's worth of the backlog at a time - it will lock up the tables and interfere with backups; its a PoS ... just create the recommended indexes (which turn out to be the same ones, bar one, that MS have added in SQL 2014)

And then I spotted that sysJobStepsLogs wasn't reducing in size. I had assumed that sp_purge_jobHistory would take care of that too ...

So how to purge that? We only have three rows in that table ... but the space used by the table is 4GB !! I haven't bothered to look into it closely but I think SQL Agent is using the nvarchar(MAX) LOG column as a circular buffer - which strikes me as a strange approach compared to a row-per-log-entry table, which would be easy to purge.

So I'm not sure that there is any benefit in purging this "gently" because each loop will, I am presuming?, reduce the circular buffer size but, in effect, write back the same row. So it will have to handle that row again, each iteration a bit smaller, but given how massive the BLOB is that's going to be a lot of logging each time.

sp_delete_jobStepLog also requires a Job ID parameter - rather than the Log ID or the GUID for the Job Step (both of which are available in the sysJobStepsLogs table). Whilst its not an unreasonable requirement it does require writing a more complicated loop to figure out which jobs actually have anything in the sysJobStepsLogs table, which sysJobSteps they refer to and, from there, which sysJobs job_id that refers to.

A Google didn't turn up much about sysJobStepsLogs / sp_delete_jobStepLog - perhaps people aren't daft enough to increase the logging retention time/amount like me?

Or maybe I've got the wrong end of the stick and there are easier ways to manage this housekeeping?

I can't help thinking its high time that MS tidied up all this crud. Job Dates being stored as separate yyyymmdd and hhmmss INTERGERS rather than DateTime datatype is way past its bedtime ... along with the Micky-Mouse coding that is to be found in so many of the System SProcs

Grrrrr ...

OK, I'll stick my neck out, here's the MSDB Stale Log / History Purging Scripts that I have so far.

Feedback welcomed :slight_smile:

-- Purge backup history - 10,000 at a time
-- BEGIN TRANSACTION
SET NOCOUNT ON
DECLARE	@iterationSize int = 10000	-- Rows to delete per iteration
								-- Retain 4-months data:
	, @StopAtDate datetime = DATEADD(Day, DATEDIFF(Day, 0, DATEADD(Month, -4, GetDate())), 0)
	, @Debug int = 0	-- 0=Silent, 1=Show Debug Info
--
DECLARE	@dtStart datetime
	, @oldest_date datetime = (SELECT MIN(backup_start_date) FROM msdb.dbo.backupset)
--
--DEBUG IF @Debug = 1 SELECT @oldest_date = DATEADD(Month, -1, @oldest_date)	-- Force an iteration

IF @Debug = 1 SELECT [@StopAtDate]=CONVERT(varchar(17), @StopAtDate, 113), [@oldest_date]=CONVERT(varchar(17), @oldest_date, 113), [Initial Record Count]=(SELECT COUNT(*) FROM msdb.dbo.backupset)
IF @Debug = 1 PRINT '-- Loop start ...'
WHILE @oldest_date < @StopAtDate
BEGIN

	SELECT	TOP 1 @oldest_date = backup_start_date
	FROM	
	(
		SELECT TOP (@iterationSize) backup_start_date
		FROM	msdb.dbo.backupset
		ORDER BY backup_start_date ASC
	) AS X
	ORDER BY backup_start_date DESC
	SELECT	 @oldest_date = CASE WHEN @oldest_date > @StopAtDate THEN @StopAtDate ELSE @oldest_date END

	SELECT	@dtStart=GetDate()
	IF @Debug = 1 SELECT	[Start]=CONVERT(varchar(17), @dtStart, 113), [@oldest_date]=CONVERT(varchar(17), @oldest_date, 113)
--
	BEGIN TRANSACTION BACKUP_PURGE_01
	EXEC msdb.dbo.sp_delete_backuphistory
		@oldest_date = @oldest_date
	COMMIT BACKUP_PURGE_01
	CHECKPOINT
--
	IF @Debug = 1 SELECT	[Elapsed(s)]=DATEDIFF(Second, @dtStart, GetDate()), [End]=GetDate(), [Record Count]=(SELECT COUNT(*) FROM msdb.dbo.backupset)
END
IF @Debug = 1 PRINT '-- Loop done'
SET NOCOUNT OFF
--
-- COMMIT
-- ROLLBACK

I'm unsure about BEGIN TRANS / COMMIT and CHECKPOINT but when running this, even though the iteration batch size was modest-ish, and the MSDB database is in SIMPLE recovery model the Log grew pretty big (around 4GB)

I was trying to force each iteration to clear the log space used, but I'm not sure what actually achieves that so my Transaction and/or Checkpoint may be superfluous?

And here's the Job History Log and Job Steps Log purge:

-- Purge Job History - 10,000 rows at a time
-- BEGIN TRANSACTION
SET NOCOUNT ON
DECLARE	@iterationSize int = 10000	-- Rows to delete per iteration
								-- Retain 4-weeks data
	, @StopAtDate datetime = DATEADD(Day, DATEDIFF(Day, 0, DATEADD(Week, -4, GetDate())), 0)
	, @Debug int = 0	-- 0=Silent, 1=Show Debug Info
--
DECLARE	@dtStart datetime
	, @oldest_date datetime = CONVERT(varchar(20), (SELECT MIN(run_date) FROM msdb.dbo.sysjobhistory))
	, @log_id int
	, @job_id uniqueidentifier
	, @intRowCount int

--DEBUG Force to run:
--DEBUG SELECT @oldest_date=DATEADD(Month, -6, @StopAtDate)

IF @Debug = 1 SELECT [@StopAtDate]=CONVERT(varchar(17), @StopAtDate, 113), [@oldest_date]=CONVERT(varchar(17), @oldest_date, 113), [Initial Record Count]=(SELECT COUNT(*) FROM msdb.dbo.sysjobhistory)
IF @Debug = 1 PRINT '-- Loop start ...'
WHILE @oldest_date < @StopAtDate
BEGIN

	SELECT	TOP 1 @oldest_date = CONVERT(Datetime, CONVERT(varchar(20), run_date))
	FROM	
	(
		SELECT TOP (@iterationSize) run_date
		FROM	msdb.dbo.sysjobhistory
		ORDER BY run_date ASC
	) AS X
	ORDER BY run_date DESC

	SELECT	@oldest_date = CASE WHEN @oldest_date > @StopAtDate THEN @StopAtDate ELSE @oldest_date END

	SELECT	@dtStart=GetDate()
	IF @Debug = 1 SELECT [Start]=CONVERT(varchar(17), @dtStart, 113), [@oldest_date]=CONVERT(varchar(17), @oldest_date, 113)
--
	BEGIN TRANSACTION JOB_PURGE_01
	EXEC msdb.dbo.sp_purge_jobhistory
		@oldest_date = @oldest_date
	COMMIT JOB_PURGE_01
	CHECKPOINT
	SELECT	@log_id = 0
		, @intRowCount = 1	-- Force first iteration

	WHILE @intRowCount >= 1
	BEGIN

		SELECT TOP 1
			@log_id = L.log_id
			, @job_id = J.job_id
		FROM	dbo.sysjobstepslogs AS L
			JOIN dbo.sysjobsteps AS S
				 ON S.step_uid = L.step_uid
			JOIN dbo.sysjobs AS J
				 ON J.job_id = S.job_id
		WHERE	 L.log_id > @log_id
		ORDER BY log_id
		SELECT	@intRowCount = @@ROWCOUNT

		IF @intRowCount = 0 GOTO SysJobStepsLogs_EXIT

		IF @Debug = 1 SELECT 'sp_delete_jobsteplog', [@job_id]=@job_id, [@oldest_date]=@oldest_date

		BEGIN TRANSACTION JOB_PURGE_02
		EXEC msdb.dbo.sp_delete_jobsteplog 
			@job_id = @job_id
			, @older_than = @oldest_date
		COMMIT JOB_PURGE_02
		CHECKPOINT

SysJobStepsLogs_EXIT:
	END
	IF @Debug = 1 SELECT	[Elapsed(s)]=DATEDIFF(Second, @dtStart, GetDate()), [End]=GetDate(), [Record Count]=(SELECT COUNT(*) FROM msdb.dbo.sysjobhistory)
END
IF @Debug = 1 PRINT '-- Loop done'
SET NOCOUNT OFF
-- COMMIT
-- ROLLBACK
1 Like