How to Disable all SQL Agent Jobs?

We have had an issue, from time to time, where Clients' 3rd Party IT support has rebooted server and it interfered with Scheduled Jobs.

Most recent one was:

Client: "We have scheduled powercut on Wednesday, we are going to powerdown at 6PM, get (rented) generator running at 9AM, power up, then power down when the power comes back on, turn off generator and switch back to mains power, then power up again. Any problem with your APP?"

We said that was fine because the only jobs that might be running at 6PM and "when power comes back on" would be Transaction Backups, and their power-fail-recovery would be fine. When the server was powered up at 9AM (on generator) all sorts of overnight jobs would run (having been delayed from "middle of night"), but that would be fine too.

But what actually happened was that they had a problem with generator, and rebooted / power-cycled :frowning: the server 3 times within a few minutes at 9AM. Overnight jobs had already kicked in and some only half-ran, which caused a variety of problems.

If we are expecting multiple reboots - e.g. when installing upgrades / patches that will require several reboots -
we disable SQL Server Service Auto-Start. In other circumstances, where users could be using SQL during the maintenance, we disable SQL Jobs. But that's a bit of a pain, plus Client has to involve us ...

What I'm thinking is to have every SQL Agent Job (which we "own") check some sort of "Global Flag" and not run if that is disabled. Maybe the flag could be the Date/Time of a "DO not run until AFTER" value. Client could set that flag/value themselves.

Does anyone do this? Do you use an alternative method? Any Gotchas?

One risk I perceive is that the SQL Agent jobs never get started again! so I'd want to have some sort of alert that covers that. We have a Banner Warning in our APP - for announcements like "Scheduled maintenance at 4PM, please complete your work before then" ... so that could include "Scheduled tasks are overdue, please inform Admin" as that would run without any scheduled task / notification system needing to run at all.

1 Like

Not under those circumstances, per se... I do have a series of jobs that, on occasion, need to be disables and re enabled in mass... CDC jobs, when failing over between AG replicas for example, need to be stopped on the node that became secondary and started on the one that became primary... and it all needs to happen seamlessly ...

I have a stored procedure that looks in MSDB for the various CDC jobs and then dynamically builds a script to execute msdb.dbo.sp_update_job and msdb.dbo.sp_start_job or msdb.dbo.sp_stop_job... Super easy and maintenance is non-existent.

1 Like

Thanks.

I reckon my best bet would be to have Step 1 call an SProc that figures out if it should run, or not, then I can build something in our APP that allows Enable / Disable of specific "categories" of Jobs.

I had a google and only found "Disable everything" or "Disable where NAME in (ThisList)". I was particularly concerned about the Disable Everything scripts as they didn't check whether something was disabled already, so they will set such previously-disabled jobs to Enabled, after the re-enable process; and using ThisList of Names strikes me as unlikely to be reliable name-matches for ever and a day!

The GUID numbers for the Jobs would be fine (persistent), and whilst we are currently a bit slack about creating adhoc jobs we could change that to Script them to make sure that all the ones that are used in multiple places are created with the same GUID.

So I suppose that would leave:

Can I deduce the GUID of currently-running-job? (to save having a hardwired parameter to MySProc in Step #1)

What's the best way to stop the Currently Running Job? (i.e. when MySproc running in Step #1 finds that it should not run)

1 Like

Ah-Hah! Bingo ... I think ...

"SQL Server Agent allows you to use tokens in Transact-SQL job step scripts"

e.g.

SELECT * FROM msdb.dbo.sysjobs  
WHERE @JobID = CONVERT(uniqueidentifier, $(ESCAPE_NONE(JOBID))) ;

OK, here's what I have come up with:

Add a STEP 1 which calls a CheckJobIsActive SProc that decides if the job can run, or not. That SProc is passed the JobID GUID.

Then Abort the job if appropriate.

So that led me to "How do I abort the job?"

I can raise an error:

DECLARE @job_GUID uniqueidentifier
	, @lob_IsActive int

SELECT @job_GUID = CONVERT(uniqueidentifier, $(ESCAPE_SQUOTE(JOBID)))

EXEC dbo.CheckJobIsActive
	@job_GUID = @job_GUID
	, @job_IsActive = @job_IsActive OUTPUT

IF @job_IsActive = 0
BEGIN
	RAISERROR ('Job aborted because currently marked Inactive', 16, 1)
END

but the problem I have with that is that it puts the error in the SQL Agent Errors Log, and there is then the risk that ignoring that (probably a "flood of such" messages) means that something more important might, also, be accidentally ignored.

The other approach I tried was to have the CheckJobIsActive SProc STOP the job. The code snippet from within my CheckJobIsActive Sproc is:

	IF @job_IsActive = 0
	BEGIN
		-- Stop the current job
		EXEC @intErrNo = msdb.dbo.sp_stop_job
					@job_id = @job_GUID
--N/A				, @originating_server = 'master_server'  
--N/A				, @server_name = 'target_server'  
	END

I think this works well for two reasons:

  1. I don't have to have any conditional "Stop the job" logic in Step 1 - so Step 1 only has logic to call the SProc - so the code (which will be in LOTS of SQL Agent Jobs) is nice and simple.
  2. The SQL Agent Error Logs shows a "Stop" symbol against that execution history item, rather than the (X) error symbol.

But ...

... I am wondering if the STOP action, if the Job is considered INActive, should be even more "silent"?

The only way (unless you know differently?) that I can figure to do that is:

  1. Step 1 checks if Job is marked as Inactive. If Active go to Step 2, if INActive go to Step 3
  2. Step 2 executes the "meat", and then terminates on success.
  3. Step 3 displays a (non-error) message, and then ALSO terminates on success.

but I am REALLY loath to set up that amount of complexity, in every job, just to get a neat display.

So unless there is a "Terminate the job with no error logging" I reckon I'll go with "Stop the job, from my CheckJobIsActive SProc'.

P.S. I could set Step 1's "On failure action" to "Quit the job reporting success", but I think?? I would be worried that if the CheckJobIsActive SProc failed, for any reason, that would go unnoticed.

1 Like

Found this on Google - which would be one way around the problem (although I think extra STEPs on the job and "conditional logic" would be no more complex, and keep it all in one place ...)

If I need to keep any job from running, I simply disable SQL Server Agent service. Why go through the hassle and complexity of messing with enabled status just to do that?

If I need to ignore only certain jobs, I put the job names in a pre-defined table, and a standard process disables them, if needed. I have a separate table that keeps track of which jobs were disabled, when and why they were disabled.

Yes, you make a good point. I foresee 3 scenarios that I'd like to cater for:

  1. If we are going to have a possible period of instability (after upgrading / doing "something") I might want to disable all jobs EXCEPT backups - so that they keep running (as they can safely be interrupted). This would be as an alternative to "Disable SQL Agent"

  2. If we have been doing something which has prevented overnight jobs running I might want JUST THEM to be disabled until "following night", rather than cut in "as soon as the server is powered back up"

  3. Last thing that I can think of is to disable some/all jobs, then make sure they are NOT still running, and only then allow IT to reboot the server. My understanding is that stopping SQL Agent Service will immediately kill (ungracefully) anything that is running Command Prompt type stuff, which is a lot more ugly than I would like.

Our stuff is, mostly, out on Client servers and their IT people talk to us far less than they should, and have no understanding of SQL at anything other than a superficial level, so whilst I may well be over thinking the problem - something I'm definitely inclined to do! - providing some simple "Tick this" options in our APP that will provide 80:20 solutions is way better than what we have right now - which is "We powered the server down and all sorts of goofy stuff is now happening" along with an implied "presume this is covered by our fixed price maintenance contract" ...

I've done the "Disable Jobs in this Name List" before (although I prefer Job ID GUIDs as names tend to be a bit fluid when we don't own the server!), but I also want to consider whether any new job, that has been added (i.e. which might is not, yet, in the List), needs to be included and I think this approach will help.

My new "Step 1 - Check if not currently Active" is, as a useful side effect, adding any new jobs to my table - so I get the opportunity to consider newly added jobs and categorise them as appropriate - e.g.

  • All jobs
  • Non-Backup Jobs
  • Overnight jobs
  • APP specific jobs

which can then be used as the basis for disabling specific collections of Jobs. This has nothing to do with the ENABLE / DISABLE property on the Job itself, so no chance that anything disabled in that way gets accidentally re-enabled.

P.S> We also have some scheduled jobs that are not run from SQL Agent (windows scheduler instead), so this would give me the opportunity to add a Step 1 to them that caused them just exit, without actually doing anything.

Kristen - Let me know if you'd like a copy of the procedure code that we're using. It won't be plug & play for you but it should be easy enough to separate the Agent Job pieces & parts from the CDC & Always On stuff, and it may help to get you pointed in the right direction.

Thanks very much, if nothing else I'm certainly curious to see what other folk do. I'll PM you an email address (unless you want to post it here)