Using tempdb to curry stored procedure arguments

gbritton · May 31, 2017, 5:49pm

I'm looking for feed back and/or someone to poke holes in something I'm experimenting with.

Our ETL system runs many large jobs each day. For each job, we log essential information. To do that, we have a log writer stored procedure. It may be called many times from other sprocs doing various parts of the overall job. However, it will only be called hundreds of times a day, not an hour, minute or second. So, it's performance is not on the critical path.

For me, something that is on the critical path is readability. A stored procedure that calls the log writer many times will be large because of the number of arguments to the log writer. However, for any one call to the main procedure, most of the arguments to the log writer do not change (like job id or date being processed etc)

I'm also learning F#, which is changing the way I look at problems like this. Basically I want to be able to curry some of the arguments to the log write calls to make them simpler. I came up with two approaches:

encapsulate the call to the log write in a variable (e.g. @logwriter nvarchar(max)) with various values set already. Use sp_executesql to call it. That's easy enough but I want to get to something that's a little more functional looking
Create a temp stored procedure, with the fixed (for the run) variables set. I tried this:

DECLARE @proc NVARCHAR(MAX)= '

CREATE PROC #log 
       @Keywords varchar(256),
       @ErrorID int,
       @CustomInput varchar(MAX) = ''''

AS BEGIN

       EXEC  [log].[ErrorLogging]
              @JobID = ''@@JobID'',
              @BatchControlID = @@BatchControlID, 
              @ProcessDate = ''@@ProcessDate'', 
              @Keywords = @Keywords,  
              @ErrorID = @ErrorID,    
              @CustomInput = @CustomInput,  
              @ErrorCap = 0 

END
';
DECLARE @d DATE= GETDATE();
DECLARE @j UNIQUEIDENTIFIER= NEWID();
SET @proc = REPLACE(REPLACE(REPLACE(@proc, '@@JobID', @j), '@@BatchControlID', 2), '@@ProcessDate', @d);
PRINT @proc;
IF OBJECT_ID('tempdb..#log', 'P') IS NOT NULL
    DROP PROC #log;
EXEC sp_executesql @proc;

Then, call the curried proc like this:

EXEC #log @keywords = 'foo', @ErrorId = 1018, @custominput = 'bar'

Either way, I get the results I expect in the log.

Comments, thoughts? Bugs or corner cases I might miss?

Kristen · June 1, 2017, 9:05am

Not sure I'll be any help, but in case it stimulates some thought here's what we do.

Our Logging sproc is called by every sproc we have ever written (give or take a couple ...) it creates a log record for the Sproc being called, and its parameters, and then Updates that log-row on exit with the success/fail value. The log row is created with a special never-otherwise-occurring outcome-value to indicate "still running" or "terminated without returning a value", and that gets overwritten by the actual return-value when (if!!) the Sproc successfully calls UpdateLogRecord

I used to concatenate the parameters with a "~" separator, thinking that saving CPU was key for this job, but it meant whenever I wanted to review WHAT the parameters of a call were I had to open the source of the Sproc and figure out what the concatenation order was. And of course that was liable to change, over time, so on a Client's database, that hadn't been updated for a while, it was a guess as to what the parameter order was.

Brett (x002548) .. RIP ... told me not to worry about that, and to concatenate the parameters in the way that was most useful for reuse / documenting (as you have mentioned) ... so I did ... and he was right, its made no discernible difference to performance (and unlike your casual use we log millions and millions of rows a day). So I changed it so that each Sproc (that calls the Logger) concatenates all parameters together with @ParamName labels suitable for reuse. The only thing I don't bother to do, because I thought it was a) excessive and b) rarely needed, is that I don't use REPLACE to double-up single quotes - so any ' in the data will break any attempt to use the log-values as-is.

All our Sprocs have param-1 as the Session ID. We are all web-based here, so SessionID this is a pretty universal value for us. What it does mean is that I can see every Sproc used (including when Proc-A calls ChildProc-B) for a given session. We also log all the pages that were rendered and also the "phone-home" from within the page once the Client has rendered it, which tells us the point-in-time at which it was available to the client.

So when viewing the log for a specific session I can see the time of the request for the page, the elapsed time of each, and all, the Sprocs that were executed, and then the phone-home time of the fully rendered page on the client (or if there is an error the log shows that the process only completed part way through). We use this info for performance checking (both to alert if anything starts running slower than normal, and also to figure out what needs optimising - without having to run Profiler all day long ...)

ALTER PROCEDURE dbo.CreateLogRecord
	@SessionID	int		-- Session ID (in SessionLog table)
	, @SprocName	varchar(255) 	-- SP Name (or Job Name etc.)
	, @Params		varchar(8000)	-- Concatenated string of parameter values
	, @Special1	varchar(8000) = NULL 	-- Other data
	, @Special2	varchar(8000) = NULL
	, @Special3	varchar(8000) = NULL
	, @Special4	varchar(8000) = NULL
	, @Special5	varchar(MAX) = NULL
	, @blnTransNoLog	bit = 1		-- 0=Always log, 1=Only log if NOT in a transaction
	, @blnBusyNoLog	bit = 1			-- 0=Always log, 1=Only log if NOT Busy
	-- Set to 1 during Busy Periods
	, @blnIsBusy	bit = 0			-- Do NOT pass this parameter, it is for DBA Control only.
	, @LogRecordID	int = NULL OUTPUT	-- ID in PageProcedureLog table. Pass to UpdateLogRecord

@Params is truncated at 8000 CHAR just to be reasonably economical, if I really need to log a varchar(MAX) I use @Special5

There is some stuff here in case the logging got too busy - our extremely frequently called Sprocs set @blnBusyNoLog=1, and they optionally are not logged during busy periods. We change the default @blnIsBusy=1 and RECreate this Sproc (rather than reading a config value from a table, which we thought would be too slow) if we have an emergency need not to log the very frequently called SProcs, but we've only ever needed this on a handful of occasions and that was because of some other slow-code, nothing to do with the actually logging itself - as it turned out, but I didn't known that originally. So: I really don't think that's necessary (and not relevant in your use-case anyway, so just mentioning it for completeness and anyone else reading this in the future)

ALTER PROCEDURE dbo.UpdateLogRecord
	@LogRecordID	int			-- ID in PageProcedureLog table. NULL will create a row
	, @ReturnVal	int
	, @ErrMsg	varchar(8000)
	, @RowCount	int = NULL
	, @ErrNo	int = NULL
	, @Special1	varchar(8000) = NULL 	-- Other data - only stored if NOT NULL
	, @Special2	varchar(8000) = NULL
	, @Special3	varchar(8000) = NULL
	, @Special4	varchar(8000) = NULL
	, @Special5	varchar(MAX) = NULL
	, @Params	varchar(8000) = NULL	-- Only used if CreateLogRecord was not used

Updating the log row basically just stores the ReturnValue and any useful values in the "Special" columns. I thought that calling a "CreateLogRecord" and then an "UpdateLogRecord" SProc pair might be extravagant, so the UpdateLogRecord has the ability to create a record (@LogRecordID=NULL and provide a value for @Params) but I was reluctant to use it like that because I lose the ability to detect Sprocs that fail to complete (e.g. DEADLOCK) and, in practice, I've never had a performance problem [With Create + Update] so I've never actually made use of it. But I CAN use UpdateLogRecord to just "log some interesting value" (can't do that with CreateLogRecord because it sets the ErrNo column to the "Assumed failed" value unless there is also an UpdateLogRecord).

We sometimes log "interesting values" within Sprocs, particularly "This should never happen" or "You might like to know this is STILL happening"! I can store a pseudo-error-no so that it shows up in Error Reports (i.e. a non-zero outcome value)

Within the body of all Sprocs the LogCreate and LogUpdate looks like this:

SELECT	@strParameters = LEFT(
				'@SomeID=' + COALESCE(CONVERT(varchar(20), @SomeID), 'NULL')
				+ ',@MyString=' + COALESCE(''''+@MyString+'''', 'NULL')
				+ ',@MyInt=' + COALESCE(CONVERT(varchar(20), @MyInt), 'NULL')
				+ ',@MyDateTime=' + COALESCE(''''+CONVERT(varchar(24), @MyDateTime, 113)+'''', 'NULL')
				+ ',@MyDate=' + COALESCE(''''+CONVERT(varchar(24), @MyDate, 113)+'''', 'NULL')
				+ ',@MyGUID=' + COALESCE(''''+CONVERT(varchar(40), @MyGUID)+'''', 'NULL')
			, 8000)

So : concatenate all parameters, then create the log record:

EXEC @LogRecordID = dbo.CreateLogRecord @SessionID, 'MySprocName', @strParameters
			, @Special1='Some Special Value'	-- optional, almost never used at CREATE

then just before the RETURN update the log record with the outcome

EXEC dbo.UpdateLogRecord @LogRecordID, @intReturnVal, @strErrMsg, @intRowCount, @intErrNo
--	, @Special1=@strSpecial1, @Special2=@strSpecial2, ...
IF @intDebug >= 1 SELECT [MySprocName]='RETURN', [@intRetVal]=@intRetVal, [Elapsed (sec)] = DATEDIFF(Second, @StartTime, GetDate()), [@intErrNo]=@intErrNo, [@@TRANCOUNT]=@@TRANCOUNT, [@strErrMsg]=@strErrMsg, *
			FROM dbo.PageProcedureLog_Short_View WHERE LogRecordID = @LogRecordID

The @intDebug is a parameter to all Sprocs which allows easy diagnostics of an Sproc. The [PageProcedureLog_Short_View] is basically all columns in the PageProcedureLog table but all truncated so that, collectively, they don't cause any overflow of output for whatever SQL tool is being used (not sure if this would be a problem in SSMS, its not a problem in Grid Mode, but I use Text Mode and Query Analyser [still! ... sorry about that!] and that has limitations.

So in my log record I wind up with:

SprocName = 'MySproc'
SessionID = 1234
Params = @Param1=1234,@Param2='ABCD', ...

and I can easily use those to re-run the Sproc for testing:

BEGIN TRANSACTION
EXEC MySproc 1234, @Param1=1234,@Param2='ABCD', ...
    , @intDebug=01
ROLLBACK

I know nothing about F#, am not sure why you need dynamic code??, and quite possibility all this wibble is totally irrelevant - if so: sorry about that

gbritton · June 1, 2017, 12:57pm

Hey Kristen! Good food for thought. My example with a logger is really just one of many uses for the technique, if it passes muster. My whole motivation is to keep things as DRY and readable as possible.

FWIW I need to create the proc dynamically to curry the parameters that won't change. Curry (named after Haskell Curry, IIRC -- yes, the language comes from his name too!) just means capture the arguments that won't change and only leave open those that will.

So, if you have a proc with 20 parameters (yes, we have some ) but for a given run, 15 are fixed and only five are truly variable, currying the 15 would mean more succinct, DRYer code. It's a concept from FP in general that I'm trying to apply within the limited syntax of T-SQL.

Kristen · June 1, 2017, 1:28pm

I'm at a bit of a loss - I'm not understanding (probably just being thick ...) why you care that some of the parameters are non-changing? Isn't it taking more effort to figure out what didn't change, than just process them all?

Is this just to reduce the number of, say, concatenations that occur if you log all parameters - and thus you only want to concatenate the [smaller number of] changing ones?

My logging suggests that there is very little time saved, and that it is not noticeable, by a) concatenating all the parameters and b) concatenating in "even more stuff" to make it more readable. So if this is a "reduce the effort" issue then I don't think its substantive.Reducing the "bulk" might be worthwhile though.

My old way, intended to be as skinny-as-possible, was:

SELECT @strParameters = COALESCE(@StringParam1, 'NULL')
       + '~' + COALESCE(CONVERT(varchar(20), @IntParam2, 'NULL')
       ...

which was minimalist because I though the CPU cycles mattered. Of course it was very cryptic when I wanted to use the logged data (e.g. to run an Sproc to see why it broke with the logged parameters)

My new way is:

SELECT @strParameters = '@StringParam1=' + COALESCE(''''+@StringParam1+'''', 'NULL')
       + ',@IntParam2=' + COALESCE(CONVERT(varchar(20), @IntParam2, 'NULL')
       ...

which is far more verbose, but I found that the CPU cycles (for the extra string-manipulation) tuned out to be trivial.

But I may have the wrong end of the stick?!! in thinking this is some sort of "compression"? I can see that it would reduce the log-data volume, and provide useful insights into "This lot all had the same values for the first 10 parameters" - yeah, there is benefit in that - but the effort to deduce it seems like quite a lot to me.

That said, we do convert the Sproc Name parameter, on-the-fly to an Integer ID (add to a lookup-table if not exists, otherwise use the existing ID), because I thought it was extravagant to hold as a String on every single log-row, but with the benefit of hindsight I wish I hadn't done. There are a fair few CPU and Disk cycles involved in doing the lookup and occasionally adding a new row to the lookup table, and that is done on EVERY CreateLogRecord. But the bigger issue is purging the data. I don't really want to remove the record from the Lookup Table as soon as there are zero matching rows in the Log Table (i.e. after the last instance in the Log Table has been purged because the log data was stale) in case that same Sproc is only called, say, once a month and would get a different ID next month. So I have to have a complicated system of "Most Recently Seen Date" in the Lookup Table, and only delete rows from lookup when a) there is no foreign key matching row in Logging Table and b) its "been a while" since it was last seen.

Meant to mention in my earlier diatribe, in case of interested, that if I was creating logging from scratch, today, one of my first concerns would be how to delete "lots of rows" for purging. We export the rows with BCP (we HAVE had to restore the log data to investigate Fraud, and restoring all the DB Backups to do that would take forever!! so the BCP-version of the log data was designed from the outset to allow that to happen independently of the database Backups themselves) and then delete them. Typically we hold one weeks worth of logging in the DB, but on busy sites that can fall to 48 hours or so. The Delete job is pretty massive, and it would benefit from partitioning where I could just throw-away the oldest, now stale, partition rather than having to physically delete row-by-row (and do it in batches, AND accelerate the log backup during the purge to prevent log-file growth, AND have dynamically reduce/increase of the delete batch size so that if there are other processes running at that time, and the delete becomes slow, that it backs off a bit ... and then speeds up again.

gbritton · June 1, 2017, 2:05pm

In a word, nope! It takes little effort to figure out what doesn't change.

I care because the proc is called many times from other procs -- sometimes 10, 20, or 30 times in different places. Needless to say, if the called proc changes its parameters, that may be a lot of error-prone maintenance. OTOH, if the called proc changes only the "fixed for this run" parameters, and I manage to curry those parameters, then there's only one thing to change: the place where the proc is curried. it's the very essence of DRY coding.

I'm also thinking of a general approach that requires no changes to the called procs. I did this recently in a script to launch SSIS packages. Saved me endless typing, copying and pasting. Most of the packages share a set of 5-10 parameters which I curried, so in the script I just write:

exec #pkg 'mypackagename'

The end result is a script that is hundreds of lines shorter than the original, uncurried version, which of course has tons of repetition. the temp proc does all the variable setup and executes the package.

Kristen · June 1, 2017, 2:11pm

NOW I get it

Not sure I've been in that exact scenario, but I have had SProcs read their parameters from some "Config Object" to keep them consistent from place-to-place and/or to reduce typing (and typOs ...)

gbritton · June 1, 2017, 2:13pm

Yup, that's another approach, but would require changing the sprocs. Sometimes you can't (SSISDB catalog procs), sometimes you shouldn't (mission-critical production) unless you have to