Importing CSV files into SQL

Hi all

I am trying to load 11 csv files, each having over 700k lines. When I use the wizard, the default for the OutputColumnWidth is 50 (when I click on Advanced to see the columns). Unfortunately, at least 42 columns need to be at least 80. If I change them for the first load, I would think I would have to go through all 313 columns in the other 10 files again to find the ones that are longer than the 50 length default. Is there a way I can change the lengths without having to go through them 1 by 1 like I did the first file?

Do you have a written "record layout" that identifies the name, type, and width of each column? I'm not talking about a "graphic". I'm talking about an editable document that could be trans-mutated into a BCP format file to control the inputs and used as a CREATE TABLE statement, as well.

Hi Jeff...Unfortunately I do not

You can change all the lengths at once. Just make them all 4000 or whatever.

Hi Scott...How would I do that. I was trying to do that and I looked and don't see a way to do that. I also know the column lengths for the files and if I could change the default in the import wizard to be the highest number it would be golden.

Also, Jeff, I did locate a Data File Layout that identifies the name, type, and width of each column but not sure how to do a BCP file and the other things you suggested.

In the "Advanced" tab, do a multi-select of all the column names in the left window (by holding the Shift key and pressing on the last column name), then change the length. That length will apply to all the columns.

Thanks Scott..that worked absolutely perfectly. Does that mean that the table is now set up this way and I wont have to do this again with the other files I will load to the new table?

Great, glad it helped.

Correct. If you use the same table table name, SQL will "know" the table already exists and give you options accordingly.

Naturally if you need to you can also ALTER columns. For example, if you determine one column is always an int or a date. If you do so, I strongly urge you to use a script to do that, because it may need done again later for that table or another import table.

1 Like

Scott..something very strange is happening...the file starts to load and once it gets to row 294,687 (of 700k) I get a truncation error on the last column. How does it take all of those lines then say that particular line/column has a truncation problem? I ran it several times, made alterations etc. but it still bombs on that line/column?

Hmm. Try using 8000 rather than 4000.

It could also be an error in the file format, but let's hope it's not that, 'cause that can be a pain to fix.

Actually, I found the data file layout and it says the longest field was 80, so I did 80 for all...and the column in question only has one character

Unfortunately - you now have to look at that row in the file and determine why that row is causing a problem. You can try opening the file in Notepad++ - but it may be too big, hopefully not.

Since this is a CSV file, there isn't any restrictions on what can be put into the file. So even if the specification says the column cannot exceed 80 characters doesn't guarantee that...

It could also be caused by an invalid character - which you should be able to identify if you can open it up in Notepad++. If that is the case, then you either need to fix the file (easiest) - or modify the package to use a different data type for the column or even a different code page for the file.

The bad part about having such unnecessarily large columns defined is that SQL Server is going to consider half that size in its memory estimates. That can have an effect on things so be careful there.

Also, bad characters in files happen all the time especially in CSV files because no one actually follows the actual correct CSV standards and, as Scott says, there's nothing to prevent such issues.

With that, I also tell you that I don't use SSIS for such things. I took the time to learn how to use BCP Format files and how to use the features of BULK INSERT and BCP to sequester "bad rows" in a separate file for correction. I suggest doing the same for future arcane but effective knowledge that can be surprisingly useful especially if you're like me and won't even stand up an SSIS instance anymore. I can only imagine that SSIS would also have such a thing as sequestering bad rows but I don't know it.

I generally use that to create a table that has all the correct datatypes and lengths There's should be a way in SSIS to sequester any bad rows without killing the whole import. I just don't know what it is because I generally use BCP or BULK INSERT for such things so that I don't have to even stand up SSIS.

Thanks @ScottPletcher, @jeffw8713 and @JeffModen. I have finished my import. It took all night and this afternoon but its done thanks to the three of you. I had previously selected a solution to my question so I am not sure how to give the 2 Jeff's credit for their assistance.

Nah... this was all you. I just made a couple of suggestions that you looked for. If you have to do this on a regular basis with less "urgency", we can help smooth a couple of things out and maybe make it so it doesn't take so long in the future. Glad you got it done and thank you for the feedback. I don't care about "points" so don't worry about that.