I haven't had to do this for a long time, I assumed it would be a No-Brainer these days ... apparently not.
I have a simple CSV file. I want to import it and create a table. This is for a throw-away job, and I thought it would take 5 minutes ...
In the CSV file all the column headings, on first row, are quoted. All Text Field data is quoted, and perhaps some quoted values might contain ",". All the rest (i.e. numeric data) are not quoted
If I import the file into Excel it comes in 100% perfect. i.e. it understands the full requirements for CSV, including comma-within-quoted-data, and the fact that the column headings are quoted - Excel removes those quotes.
If I just try to do an IMPORT from SSMS (which I presume is using SSIS under the covers) it creates all the column names with Quotes (even though I put " in the Text Qualifier - which DOES remove it from the individual data values)
It gives me a table with every column set to VARCHAR(50) - whereas Excel previews the first N rows and decides on data type from that. Clearly 50 is too narrow for some values, and I'm going to have to figure out all the datatypes for all the non-varchar ones.
When I run the import it complains that the NOTES column is either too long or contains an invalid character (and the subsequent cascade of error messages seem to indicate its too long, rather than a weird character issue)
So I carefully went through and set all the columns to varchar(500), the NOTES to varchar(MAX) and ran the import again (of course the table now exists, even though the actual data import was ROLLED BACK because of the truncation error ... blinking useless ... so I have to manually DROP the table and THEN try again)
Same thing
So on the "Review Data Type Mapping" I set the NOTES column to IGNORE on both "On Error" and "On Truncation". Same thing.
So I ALSO set the overall/global table import to IGNORE both types of error. Still the same thing.
So I cannot get ANY data in at all - I can't find a way to get it to ignore the duff rows.
Looked at the error message and line number in the data file and there is a line break in the Notes. Ho!Hum! The Quoting is still spot-on, so wouldn't be hard to accommodate (and must surly be a common issue?)
So I removed the line breaks in that NOTES row, same thing (same Row No). I couldn't see anything in the NOTES (in the CSV) that was a problem, but I removed the notes (i.e. now a blank field). It stopped on a subsequent line that had NOTES - so I deleted them too. The second one was definitely not a multi-line value, and I couldn't see any funny characters - and no embedded COMMA.
Finally it imported. I now have a table that has leading/trailing Quotes on every column name ... and all the numeric values are in varchar(50) datatype columns. I did not manually override them as I had no idea what the range of values was, so I now have to examine the data myself, choose a suitable datatype, modify the table ...
... blinking heck, its going to take me all day just to import one lousy CSV file ...
... is this really as good as it gets in SQL2012 ?
What's the best way to import CSV?
Am I better off with import-into-one-field-per-row and then split it myself?
In case, in my ignorance, I missed something obvious here are the screen shots and errors messages I got
Executing (Error)
Messages
Error 0xc02020a1: Data Flow Task 1: Data conversion failed. The data conversion for column ""Notes"" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
(SQL Server Import and Export Wizard)
Error 0xc020902a: Data Flow Task 1: The "Source - customer_csv.Outputs[Flat File Source Output].Columns["Notes"]" failed because truncation occurred, and the truncation row disposition on "Source - customer_csv.Outputs[Flat File Source Output].Columns["Notes"]" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
(SQL Server Import and Export Wizard)
Error 0xc0202092: Data Flow Task 1: An error occurred while processing file "C:\xxx\customer.csv" on data row 183.
(SQL Server Import and Export Wizard)
Error 0xc0047038: Data Flow Task 1: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on Source - customer_csv returned error code 0xC0202092. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
(SQL Server Import and Export Wizard)