SSIS error handling to cover ALL ERRORS package-wide

ipisors · August 12, 2015, 10:17pm

I have the following goal: to trigger a Send Mail task, any time the package fails for any reason. Up to now, I thought that would be achieved by doing an On Error in the package-level executable.

But I found out that this event does not fire if the package fails at validation. So I had scheduled a job (sql job agent) and didn't know it was failing all week...NOT GOOD.

What is the single (or multiple, if necessary) Event I can hook into that will cover any and all package failures?

ipisors · August 12, 2015, 10:40pm

Seems like there is no way, at least not from within the package. This is what I hear from others- very unpleasant news to me. Because our Database Mail is not functioning right now, so that's not an option.
Guess I'll come up with an adhoc method, like inserting a 'success' tag into a table at the end of the package, and then having another package (step in the sql agent job) to check the value in the table, if not present send mail task there. Fragile...I don't like it.

Kristen · August 13, 2015, 12:29am

Do you think? It would be my preferred choice - a Logging table with columns like:

ID
Date
Job Name
Step Name
Status
Error No
Notes

Create a row at the start and set status to, say, 1. Maybe set Error No to 999 indicating "Running"

At the end update that ID to status = 0, Error No = 0 if no error, or status = 2, Error No = Some non-zero value to indicate which error occurred.

Then have a report run each day, and maybe send it to you as an email, with any process that:

Has a Status=2 (Error)
Has a status=1 (Running) and should have finished
Has a MAX(create date) which is more than 24 hours ago - i.e. it has not run (you might need a CONFIG table to allow you to set a tolerance on that e.g. if you have processes that only run once a week.

I don't trust SQL Agent Jobs to send me an email on error ... all sorts of things could go wrong. We have a daily SQL Agent Job that deliberately fails so that I can check that SQL Agent IS running correctly and WILL send an email on failure ... in case someone twiddles with the mail settings or similar.

I'd much prefer to get one email a day listing all the success / failures (or to run an on-demand report, instead) then I only have to remember to check that I did indeed get that one email and open it to see if anything is reported as having failed, or is "overdue")

ipisors · August 13, 2015, 1:32am

Here is what I am thinking when I say Fragile.
The package, or report, that reads whether things 'succeeded' in the table - it could fail, too. In fact it could fail for the same reason the core package in question , failed. And on and on it goes...

I've actually seen SQL Job Agent errors that reveal to me that the Send Mail task in an error handler is the thing that failed!

So I do like my/your "table-based" idea, from a theoretical/organizational perspective, but from a perspective of failures..... And the fact that the way we are checking the failure/succeed situation is by utilizing the same technology that failed in the first place, sure the odds of all of it failing are slim, but they certainly can happen and it's what makes me think there must be a better way.

Kristen · August 13, 2015, 6:58am

I think it is falisafe, but its an interesting point and worthy of discussion.

So the hypothesis is that I write a report to show both all errors in the log, plus any job that has not completed in a reasonable time, but there is a bug in my report so it, falsely, reports that the job has completed successfully.

I think that is unlikely because the testing of the report is under my control. I will put deliberate error data in the Log table, during Testing, to exercise my SQL to check that it finds such things. But even if I don't it will come to light, at some point, that there is a bug and then I will fix it thus, over time, making the report more and more robust.

By contrast email on a SQL Agent job can fail for all sorts of reasons, and its pretty hard to imagine that all the possible failures in the job which COULD lead to error (and the need for SQL Agent to send an email) are EVER tested! Indeed, most SQL Agent jobs that I see were "thrown" together - I hate jobs that are built like that, but I see it more-often-than-not.. I can imagine SQL Agent never detecting an error in all sorts of ways; for example a SQL Agent job that says:

EXEC MyWrapperSProc

where MyWrapperSProc calls several child sprocs. Failure of a child Sproc will be undetectable to SQL Agent, unless MyWrapperSProc diligently checks for errors in the child processes AND passes that error back to SQL Agent. (A child Sproc usually fails "silently" to anything other than the calling Sproc)

Whereas if each child sproc logs its start & end then any failure to log the End will result in a "should have finished by now" condition in the errors report.(We have templates for all child sprocs used in this way, so I have confidence that they do have logging built in )

Th other point of failure is the email telling you that the Error Log Report found an error. IME normal usage by SQL Agent jobs is to set up a "Send me an email if it breaks". I think this is totally unacceptable as a solution because if the email fails for any reason the default assumption is "It worked" !!

Whereas if you always get an email, and it says success / failure, then all you need to do is to remember that you SHOULD get an email. Now ... if every SQL Agent job sends you a Success/Failure email then you have zero chance of remembering whether you have received 100 emails today, or only 99 (and then trying to figure out which of the 100 is missing!).

Whereas if you only get one email, and it checks that all 100 jobs ran successfully you only have to remember to check for that one email.

Or just run the report first thing each morning, instead of remembering to check for the email

I've done similar things for mission critical tasks which output a log file (instead of inserting rows in a log table). Things like the report that the Maintenance Plan produces. We wrote code that checked, line-by-line, that the log file contained the expected text lines. It allowed for a date / filesize to be different, but caught any line in the file that was unexpected / not matching the "template". That would not have caught unexpected filesize (although that could have been programmed in of course) but it did do the job of looking for a "needle in a haystack" that humans are usually very bad at - the Needle only happens on the day that the human is having a Bad Day, in my experience!

Life-critical systems are sometimes built in duplicate/triplicate with different teams, different programming languages, and thus different compilers etc. - maybe different hardware platforms even? I've never had to program a life-critical system ... I'm not sure I would want to, whilst I think I am an excellent programmer, and that I have a vanishingly small error-rate, I'm not sure I would want the worry that if I screw up it could kill someone. Mates of mine who program in the aerospace industry tell me that they never worry about that because whereas I have systems in place that mean that I very rarely produce a bug, and if I do that that even more rarely gets into Production, they have systems in place that mean that their rare mistake never gets into production.

I still think that "never" doesn't exist in programming, but they seem happy that their code is perfect - I am pretty sure they are not relying on EMails from SQL Agent job failures

gbritton · August 13, 2015, 1:04pm

hubris or sarcasm?

Kristen · August 13, 2015, 2:17pm

Overstated, I'm sure, but true enough. If an error gets into production my second action (after fixing it) is to put steps in place that however it happened can never happen again. If avoidable errors occur in DEV I get very annoyed - time wasted finding & fixing them is bottom-line profit lost. I aim to have as much of my code mechanically-generated as I possibly can. Templates for anything less common that I don't have mechanical code generation solutions for, and a knack to creating a robust test-framework for anything truly adhoc to reduce chance of anything getting through DEV and into QA.

ipisors · August 14, 2015, 9:52pm

Well Kristen, when I have more time I'd like to reply to the core of your previous posts (hopefully chatty discussion is not discouraged on this forum as it is on stack overflow - I'm new here so I'm unsure - but goodness, the long discussions are usually 100x as valuable as the short answers, so I hope it's not!),
but I will say one thing quick: You are very lucky and fortunate to work in an environment where you have the luxury and ability (resources, approval, timeline, and an organization committed to planning things in advance, even if those things are contingency techniques for things that cannot be planned in advance), to be able to have all of DEV, TEST and QA.

This makes it much easier to arrive at a place where you have virtually no errors in production. Plenty of us would like to be in an environment that guarantees such a result.

As such, one could conclude more accurately that the existence of production errors isn't properly viewed as necessarily the developer's "fault", per se, and conversely neither are the lack of production errors properly viewed as necessary 100% to the developer's credit, either. Both of them are quite possibly 70% owing to the development structure put in place by a smart (or dumb) organization...........and 30% to the talent/diligence of the developer, or lack thereof.

The longer I work on computers the more convinced I am of that.

One more comment - while your commitment to mechanically produced code/artifacts is laudable and politically correct, so to speak, the philosophical thinker in me always has a poor reaction to that kind of thing, (or is it the Worry Wart?) because I worry that--as a general principle which actually covers ALL tech development actually--the more automated the design process becomes, the more distant you get from it, and thus, the more your skillset and knowledge in that mechanically produced code diminishes. Of course, you may not care - you may be wanting to hone your skills in "the latest" thing which isn't yet mechanically generated yet.....But next year it will be, and then most people's knowledge of that will be unnecessary (except, of course, for the person who designs the thing that mechanically generates it - and his/her knowledge is probably worth 10x everyone else's). For example, it may seem cool that someone can create a "SQL generator" tool (and indeed, it is cool, and says a lot about THEIR skills), but if I use that, then my own skill in SQL has not only been made irrelevant, but it will soon diminish if not disappear. At that point I have ceased to be a developer and become an end user - at least, as SQL goes. (insert whatever the programming skill happens to be - Sure you can choose to use SSIS's File/Folder tools, but wouldn't it be loads more fun to use a script task and custom code it in VB or C, you'd learn a lot more, and the resulting skillset would be MUCH more valuable, portable, and have a longer shelf life - so there's a trade off right there).

Anyway the whole phenomenon is interesting and can be seen, in a sense, with technology and the public, too. Apple has done a great job (for example) in making the masses more familiar with USING technology, and probably less familiar than ever with actually understanding it. This is my prime argument against Mac and for PC - really has nothing to do with the technology at all, has to do with its effect on mankind, which I find to be more serious in the first place.

Interesting discussion with lots of directions to go. Thanks

Kristen · August 18, 2015, 12:49pm

Not aware that having a chat here is a problem ... Thread-count here is less than other places, so some waffle doesn't overwhelm the thread.

I most definitely do not see it as a luxury, its a necessity and I would question the cost-benefit for any organisation that does NOT do that.

Fixing a bug at Design stage is cheapest ...
... fixing it once it is in Production is incredibly expensive, by comparison.

Once a bug is in production there is a chance that data has been damaged (or some other side-effect / cause&effect), so in addition to fixing the bug you have to fix the collateral damage. If the collateral damage was "losing clients" you can't fix that ...

To me this is the difference between "computer science" and "amateur programming". I've seen responsibility (i.e. at our Clients) for web-base / customer-facing projects move from IT / Computer departments to Marketing over the last 10 years or so. The effect has been disasterous. We've stopped building shopping-cart websites (our product was at the "big end" so our Clients had requirements for Click&Collect, 100-or-more high street stores, n0,000's or n00,000's of product lines) but once the projects moved to Marketing all the Client was interest in was pixel-perfect layout and they never engaged with us that their design meant that the checkout would not work in certain circumstances, or their need for "Friendly URLs" (i..e their definition was that URL must exclude the product code) needed a massively complex solution if they ever needed to rename a product name (Marketing's answer was along the lines of "That will never happen", in practice what has happened is that you get a 404 )

I expect it was apocryphal?? but I remember the story in, probably, 80's that a new version of Microsoft Office had a pay-bonus attached to it, for DEVs, based on how few bug-reports were made post-launch. Around here programming is referred to as a Death and Glory job - although I've never been aware of any Glory when we did something cool, but of course if anything doesn't work the Death component is attributed solely to us

Not sure I agree with that. Folk here write code, we just don't have to write any routine / humdrum code. Dunno what the proportion is - I'd like to think that 5% of the code is hand-written and the rest mechanical, but I expect its nothing like as high as that. The 5% is challenging, so no room for lightweight programmers doing that part. But that would not be the emphasis I would place on the benefit of mechanical code, for me the absolutely key thing is that our mechanically generated code WORKS PERFECTLY. There is zero chance that a programmer having a bad day missed a bit, there is zero chance that a Junior Programmer didn't understand / know / remember / hadn't been told yet that "You always have to do X when there is a Y". Our mechanically generated code is 99.999% reliable. And for the remaining situation where it turns out that it is NOT reliable then 100% of our mechanically generated code will break in exactly the same place. This is good because a) far more chance of that fault actually arising and thus leading to a fix and b) once fixed (and everything regenerated) then it is fixed for every single occurrence.

There is also a cost benefit. We mostly use mechanically generated code for CRUD. I expect that less than 5% of our tables have bespoke CRUD routines, so at the start of a project we build the tables and press a button and we have a functioning system. In practice a large proportion of the tables are code-lookup tables - so the CRUD for them is mind numbingly simple. GET Record, SAVE or DELETE it. At the start of a new project the ability to get all the tedious Code Lookup tables built and all their associated Data Entry Forms, means we can start work on the real stuff immediately.

(It goes further than that for us, so my example above is over simplified. For example, our naming convention "implies" any FKey relationship, so just creating tables and columns causes FKey DDL constraints to be generated, and the CRUD stuff to "assume" relationships and so on. Our column names are required to be consistent for suffixes - so "_ID" WILL be an IDENTITY, _GUID maps to NewID() and so on, "_Code" WILL be a user-provided alpha-numeric under OUR control, "_Ref" WILL be a user-provided alpha-numeric, usually external, which is NOT under our control, and so on. So by using those "in house rules" all sorts of downstream stuff happens as a "benefit"

As it happens we do actually improve our Mechanically Generated code quite often. A new version of SQL comes out and has, say, better Paging support - e.g. OFFSET/FETCH introduced with SQL 2012. So we build that into the Templates, regenerate the SProcs that use that template, and Bang! all Sprocs that do paging now take advantage of that improved code & performance. Sometimes the improvement is something where we slap our hands on our foreheads and say "Why on earth did we do it that way?" Just recently I have changed our logging. I've been wanting to do it for years ... we used to store the parameters as a "Value1~Value2~...~ValueN" list. If we ever wanted to actually test the Sproc we had to convert the "~xxx~yyy~zzz" list into Sproc call Parameter syntax. Tedious. The new way is to store the parameter values as "@Param1='Value1',@Param2='Value2',..." so I can just cut & paste into SSMS to try that parameter list in the SProc. Its a huge joy to have, I've been waiting a long time for it :), thanks to Brett R.I.P. for convincing me, years ago, that the CPU effort was so tiny that it was fine to do it, BUT it was a major change to a) the logging which is the busiest part of our APP and b) requires a change to every single SProc we have ever written (although I have built the replacement to be backwards compatible so we don't have to rush to refactor everything!!)

So we do, actually, improve our mechanically generated code from time to time. Generally if we find a core bug, or an issue with core code that can be improved, we fix it immediately. We do not wait for "some future point in time". We have some slack built into our project schedules to accommodate that; many coding shops don't and IMHO suffer as a result. How on earth would a Dentist work if he had no slack in his schedule to accept an appointment for someone with toothache if he filled up his calendar solely with 6-monthly-check-ups? We are the same ... except that if calendar-slack is not used for fixing something we crack on with the project and bring the shipping date forwards, whereas the Dentist has to read a book - or maybe does his bookkeeping and then has more weekend-time on the golf course?!!

My advice: don't postpone fixing core improvements and fixes.

ipisors · August 18, 2015, 3:52pm

While I still think everything I said is true to a point, your points are also very well taken - You don't have to convince me that doing things right is better, (obviously), and I suppose that Luxury vs. Necessity is a matter of the semantical approach one takes depending on whether one has it or doesn't have it. Things that simply cannot be obtained are sometimes considered luxuries whilst those who have them consider them necessities. Unfortunately the fact that you and I agree on the way things should be done, and have logically compelling arguments to support them, doesn't change the fact that I cannot change what I cannot change - therefore I do think it's a luxury to have all of that well thought out structure in place (You would too if it was suddenly taken away from you , which I hope it never is), and I've no doubt that those structures are the primary reason mistakes don't happen, not developer genius.

What is humdrum vs. complex is a matter of context and usage, of course - - It would feel odd to not use that humdrum code for 10 years, and end up stumbling on the most simplistic code when you have to use it, so I prefer to maintain some familiarity with all code (just about).

I remember when I started learning SQL Server after using MS Access, there were those who mocked Access, but I felt differently; I wanted to maintain my familiarity with it regardless. Years later I'm glad I did; I have yet to find a better front end development tool that is as widely accepted and easily built - those who knew nothing but SQL now have no way to provide FE apps to users. This is just a story about my context -yours will be different, of course, depending on what platforms your people have access to.

Ultimately, I believe there are roles that will always exist that will require (for whatever reason) developers to tinker with all levels of code - and I'd prefer to have my hands in all of it to some extent and at least once in a while, or (yes), you will forget things. In other words, from the standpoint of someone evaluating the value of a business, SURE - those doing the most complex work efficiently win the prize. But I don't evaluate my career from the business standpoint, I evaluate it from my own standpoint, and the most valuable thing to me and my knowledge is to remain familiar with a great variety of it - that's my personal outlook on what ensures my versatility and portability, 2 things that I hold dear and close in my personal career journey - everyone will be different.

And by the way, I do like your dentist analogy - but it reminds me that I HATE dentists who think they are too fancy to do their own cleanings. A cleaning is the only thing I go to the dentist for in the first place, and after choosing a location based on the dentist, it peeves me to go and be served by nobody but his assistant, a different one every 2 months. I've been to some dentists who did cleanings so rarely they could barely stumble though one and were awful at it....I don't like that.

mikebird · November 9, 2015, 9:52pm

Glad to find this. Had trouble loading csv - simple task-import before SSIS extras. I get a bunch of csv files I want to string together by date. Got what I want in Excel & Crystal perfectly. 1 click to acquire the file. Results to PDF. Just wanting to build history. Each csv is a quarterly snapshot.

All parameters tried... column headers in 1st row... sure this import is a usual legacy disappointment. Time taken on options. Errors never go away. Can get text in but want accurate type format. Essentially an SSIS template for regular use. Just disposing of 3 unwanted of 8 columns, as goes perfectly with Excel by trimming undesired fields and Crystal unselected or linked. Advanced pane to delete columns. Tried from beginning 1 of 8 or the other end, starting with 8, deleting does scramble more material into one column. Probably one of CR, LF... delimiter definition options chosen might fix the lot. Sure I've gone round from different aspects. Choosing types: decimal, float - what difference? When I get green success, I get a useless new table of varchar by default. I need date, string, the rest 0.00 that's all.

ipisors · November 9, 2015, 11:14pm

I had a hard time understanding your post, a lot of stuff in there, but ... my guess is that a solution is hiding somewhere in here:

Probably one of CR, LF... delimiter definition options chosen might fix the lot

If you remove some columns, and the content of the other columns suddenly changes, then I'm thinking you could make something more solid in the area of delimiters (column and line). ensure all lines have true line break line feeds, and ensure all delimiters are present (and don't exist naturally in the data).

I always make sure that, for my flat file connection managers, I start out with a SAMPLE FILE that meets the requirements PERFECTlLY, and will never, ever go away - i.e., the package can always see it, even though a variable / expression changes its connection string at runtime. It's nice to have the design-level file basis be something that stays in there, otherwise each time you go into design view, you have to re-do everything.