SQLTeam.com | Weblogs | Forums

Importing CSV file to particular table in sql server 2008r2

sql2008r2

#1

I want to import a csv file for every week into a particular table of database in sql server 20008R2 using stored procedures.
And once the file is imported from a particular location, it should move to a different folder and save there with date. It should be automatically be done, once the csv file being kept in particular location. so new csv file with the same name being kept in a particular location for every week.


#2

I would do it from Command Line, rather than Stored procedure (using batch file or Power Script or similar)

BCP to import the file and command line / PowerShell to move the files after success etc.

I expect that SSIS would achieve the same objective, if you preferred


#3

Thanks. Let me try this one


#4

I'd do it from the command line, as well, but not directly. I'd use xp_CmdShell to check for the file existence, BULK INSERT it, move it to it's final resting place, and could be setup as a scheduled job to look for the file on a regular basis.


#5

I wonder what the Pros and Cons of the two approaches are?

my BATch files tend to be quite big - more error checking and logging logic than actual "doing stuff" steps.

Rename the last, say, 10 folders to allow some time to elapse in case of a "what happened there then?" question several days after the problem.

Ability to blame the "other party" so we don't get lumbered with the blame. (I have one routine that does a DIR of the "incoming files" folder to a, LOG file every 5 minutes to prove what time the file actually arrived as its "supplier" is so unreliable).

And so on and on ... (in SPADES!!)

I find that easier to manage in a BATch file, which is technology that I am very familiar with, compared to PowerShell) and I think?? that I would find that a fair bit harder with xp_CmdShell - even thought I rate my skills within the SQL arena as better than my BATch / Command Line skills


#6

You've identified the Pros in that very same thread. DOS and Batch Files sound archaic to many but they're a very powerful tool that simply can't be ignore. xp_CmdShell can execute your existing batch files and so much more.

For example, it CAN be used to create batch files dynamically based on current conditions, execute them, AND easily capture the output that would normally appear on the screen during a command window session not to mention making the scheduling of such things auditable through SQL Agent Job History. It's also easy to capture the output of things like DIR commands for audit purposes without having to redirect the output to files first. Since all of this can be captured by the wonderfully auto-magic OUTPUT of xp_CmdShell, you can even backup the output as a permanent log WITH COMPRESSION so that it doesn't take up so much space and is, of course, pretty easy to "play back" with simple queries if you also capture the date/time of each returned line along with an IDENTITY column that acts as a tie-breaker for identical times.

What's really cool is you can get really creative to overcome some natural shortfalls of the CLI and SQL Server. For example, capturing error messages about things like truncated data during BULK INSERTs is almost useless because of the nearly cryptic nature of the messages for such events. Wouldn't it be nice if the error messages were more like those that are returned to the screen where the line #, column, and error text is clearly displayed? You can do that very easily... use xp_CmdShell to call SQLCmd to call your import stored procedure and all that wonderful messaging is available in the OUTPUT of xp_CmdShell, which is easily captured by the OUTPUT of xp_CmdShell and easily checked.


#7

That I need to try ... we do use the "-o" option of BCP to capture the truncated data etc. messages, and they are half-helpful, but often the problem arises because the delimited structure of the file is shot - of someone has added a new column - and not being able to see the whole thing "in context" makes the diagnosis task slower.

Is SSIS (which i don't use) any more robust in this regard? and does it perform better at helping diagnose data import issues? My only interaction with it was more than a decade ago, and back then the Client's IT folk spent the whole morning, pretty much every day, figuring out what had broken amongst the multitude of SSIS (or whatever it was called back then) jobs. I couldn't help thinking that if they have built a framework around their imports they'd be able to figure out the problem much more quickly - heck, even have the software "recover" from many of the issues (e.g. import into staging table with VARCHAR(8000) or VARCHAR(MAX) columns and then check each column for width-exceeded, or "not numeric" or whatever before THEN attempting to import the data into the production tables.

P.S. In our import routines we check the width of every column, on every row, because sooner or later someone will increase the column width DDL - and, of course, not bother to tell us! The Client gets a human-readable message in their "Data Validation Error Report" in that instance ...


#8

I'm also one that doesn't use SSIS for pretty much the same reasons you stated. In my previous job, I saw what I thought to be simple tasks for T-SQL/xp_CmdShell being blown way out of proportion in SSIS. I took a couple of their more complicated packages and demonstrated how easy it is to do and promote from environment to environment if SSIS is avoided. Then, I spend most of the rest of my employment time there replacing SSIS packages with T-SQL and training others to do it because of what I showed them.

As for checking the width of columns, I never import into staging tables that use a generic width such as VARCHAR(8000) for every field in the file. I always make sure that the staging table has both the correct width for character based columns and the correct datatype for non-character based columns. That allows BCP or BULK INSERT do all of that checking auto-magically and, using that trick I told you about using SQLCMD to run the import procs, I can easily catch all such changes to width, etc, auto-magically. Of course, I also set the number of allowed errors to 2 billion so that the system checks the entire file. I'll also sometime take the time to setup error row capturing to external files to quickly sequester the bad rows so that I can look at them without having to search for them in the actual file itself.


#9

My aim was to provide an end-user report, rather than a SysOp one. Clients outsource their IT so it costs them money to have duff-data sources found ... but, that said, of course they never take those savings into account when we have to make a Pitch against a potential competitor.


#10

Ah. Got it. You want to let them know EVERYTHING that went wrong with any given row. You can still do that and it may even be faster because you'll know the rows that failed and then you can report on those only instead of having to run through the full table.


#11

One other thing that occurs to me:

My VARCHAR(8000) style table is not just for data that arrives via BCP, we also have (say) data arriving from MySQL. For that I use a Linked Server, to the MySQL database, and I use OPENQUERY to pull the rows of interest, and stuff them into a Staging Table (same as if the data came from BCP). Again, in that instance, the data might be goofy - including the Width of a given column having increased, but there are also other data sources where we have, say, a VARCHAR column containing a Date or an Integer which we want to convert and store in MS SQL native Datatype, but the column might contain Goodness Knows What!! (e.g. in some flavours / configurations of MySQL illegal dates can be stored)


#12

That's certainly a horse of a different color.


#13

Indeed :slight_smile:

FWIW it enables me to have a one-stop-shop solution for all such types of data import


#14

Just a thought. Sometimes generic solutions come with the penalty of certain applications of the solution being a fair bit slower than they need to be. If you can tolerate that bit of overhead, then no problem.

A personal example is the DelimitedSplit8K function. I've been asked why I didn't make it so that it would handle multi-character delimiters and my answer is because 1) that makes the function slower and 2) 99% of what I use it for is for single character delimiters. In other words, 99% of what I use it for would be slower because of making it more generic to handle just 1% of the other situations. In order to handle the other situations, I just do a bit of pre-processing to convert the multi-character delimiters to single characters.


#15

Yup, might well be the case. Data Imports tend to be "bespoke", so it suits us to use templates so that we can, largely, mechanically generate the code - keeps the cost down and reduces errors (particularly in an area where Client doesn't have a big appetite for testing! and it doesn't get foot-fall from users hammering it.

Yup, makes sense. FWIW I have different functions where I want an INT and VARCHAR result, and also a separate function where the delimited list is VARCHAR(MAX). That was prompted by something I read on Erland Sommarskog site - that performance was significantly improved by having a dedicated function that returns INT, for those uses where that is a requirement. He also compares against the delimiter using forced BINARY collation (i.e. case sensitive) because he said that was faster. I can't imagine a situation where I need a case-insensitive letter as a delimiter :slight_smile: