SQLTeam.com | Weblogs | Forums

Import CSV file into SQL Server - its absolutely brain dead

csv
import
sql2012

#1

I haven't had to do this for a long time, I assumed it would be a No-Brainer these days ... apparently not.

I have a simple CSV file. I want to import it and create a table. This is for a throw-away job, and I thought it would take 5 minutes ...

In the CSV file all the column headings, on first row, are quoted. All Text Field data is quoted, and perhaps some quoted values might contain ",". All the rest (i.e. numeric data) are not quoted

If I import the file into Excel it comes in 100% perfect. i.e. it understands the full requirements for CSV, including comma-within-quoted-data, and the fact that the column headings are quoted - Excel removes those quotes.

If I just try to do an IMPORT from SSMS (which I presume is using SSIS under the covers) it creates all the column names with Quotes (even though I put " in the Text Qualifier - which DOES remove it from the individual data values)

It gives me a table with every column set to VARCHAR(50) - whereas Excel previews the first N rows and decides on data type from that. Clearly 50 is too narrow for some values, and I'm going to have to figure out all the datatypes for all the non-varchar ones.

When I run the import it complains that the NOTES column is either too long or contains an invalid character (and the subsequent cascade of error messages seem to indicate its too long, rather than a weird character issue)

So I carefully went through and set all the columns to varchar(500), the NOTES to varchar(MAX) and ran the import again (of course the table now exists, even though the actual data import was ROLLED BACK because of the truncation error ... blinking useless ... so I have to manually DROP the table and THEN try again)

Same thing

So on the "Review Data Type Mapping" I set the NOTES column to IGNORE on both "On Error" and "On Truncation". Same thing.

So I ALSO set the overall/global table import to IGNORE both types of error. Still the same thing.

So I cannot get ANY data in at all - I can't find a way to get it to ignore the duff rows.

Looked at the error message and line number in the data file and there is a line break in the Notes. Ho!Hum! The Quoting is still spot-on, so wouldn't be hard to accommodate (and must surly be a common issue?)

So I removed the line breaks in that NOTES row, same thing (same Row No). I couldn't see anything in the NOTES (in the CSV) that was a problem, but I removed the notes (i.e. now a blank field). It stopped on a subsequent line that had NOTES - so I deleted them too. The second one was definitely not a multi-line value, and I couldn't see any funny characters - and no embedded COMMA.

Finally it imported. I now have a table that has leading/trailing Quotes on every column name ... and all the numeric values are in varchar(50) datatype columns. I did not manually override them as I had no idea what the range of values was, so I now have to examine the data myself, choose a suitable datatype, modify the table ...

... blinking heck, its going to take me all day just to import one lousy CSV file ...

... is this really as good as it gets in SQL2012 ?

What's the best way to import CSV?

Am I better off with import-into-one-field-per-row and then split it myself?

In case, in my ignorance, I missed something obvious here are the screen shots and errors messages I got

Executing (Error)
Messages
Error 0xc02020a1: Data Flow Task 1: Data conversion failed. The data conversion for column ""Notes"" returned status value 4 and status text "Text was truncated or one or more characters had no match in the target code page.".
(SQL Server Import and Export Wizard)

Error 0xc020902a: Data Flow Task 1: The "Source - customer_csv.Outputs[Flat File Source Output].Columns["Notes"]" failed because truncation occurred, and the truncation row disposition on "Source - customer_csv.Outputs[Flat File Source Output].Columns["Notes"]" specifies failure on truncation. A truncation error occurred on the specified object of the specified component.
(SQL Server Import and Export Wizard)

Error 0xc0202092: Data Flow Task 1: An error occurred while processing file "C:\xxx\customer.csv" on data row 183.
(SQL Server Import and Export Wizard)

Error 0xc0047038: Data Flow Task 1: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. The PrimeOutput method on Source - customer_csv returned error code 0xC0202092. The component returned a failure code when the pipeline engine called PrimeOutput(). The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. There may be error messages posted before this with more information about the failure.
(SQL Server Import and Export Wizard)


#2

Hahaha!! all the "empty" columns have come in as blank-string, rather than NULL ...


#3

Maybe try nvarchar?

An input value of ,"", should result in an empty string value, I would think, whereas ,,, should result in a NULL value.


#4

SSIS ingest can sometimes be like killing a flee with a grenade. I recommend powershell for these type of one offs


#5

I have used this puppy in the past. a bit dusty but might be helpful now or in the future if you clean it up.


$qry = 'select * from [Sheet1$]'
$sqlserver = 'yourdestserver'
$dbname = 'yourdbname'

#Create a table in destination database with 
#the with referenced columns and table name.
$tblname = 'yourtablename'
 
#######################
function Get-ExcelData
{
 
    param($connString, $qry='select * from [Sheet1$]')
 
    $conn = new-object System.Data.OleDb.OleDbConnection($connString)
    $conn.open()
    $cmd = new-object System.Data.OleDb.OleDbCommand($qry,$conn) 
    $da = new-object System.Data.OleDb.OleDbDataAdapter($cmd) 
    $dt = new-object System.Data.dataTable 
    [void]$da.fill($dt)
    $conn.close()
    $dt
 
} #Get-ExcelData
 
#######################
function Write-DataTableToDatabase
{ 
    param($dt,$destServer,$destDb,$destTbl)

    $connectionString = "Data Source=$destServer;Integrated Security=true;Initial Catalog=$destdb;"
    $bulkCopy = new-object ("Data.SqlClient.SqlBulkCopy") $connectionString
    $bulkCopy.DestinationTableName = "$destTbl"
    $bulkCopy.WriteToServer($dt)
 
}# Write-DataTableToDatabase

$RootFolder = ("C:\TheKristenShuffle\"); 
$dasfiles = $(get-childitem $RootFolder.FullName ); 

Write-Host $RootFolder

foreach( $dasfile in $dasfiles)
{
	Write-Host $dasfile
	
	$fileNameWithFullPath = $dasfile.FullName
	
	#Connection String for Excel 2007:
	$connString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=`"$fileNameWithFullPath`";Extended Properties=`"Excel 12.0 Xml;HDR=YES`";"
	
	#######################
	$dt = Get-ExcelData $connString $qry
	Write-DataTableToDatabase $dt $sqlserver $dbname $tblname
}	

#6

If this is for a one-of task, then cheat like hell. Do like you did to start with... load the file into an Excel Spreadsheet. Once that's done, save it to a different file as a TAB Separated file and it should import fine from there. Either that or save the spreadsheet as a spreadsheet and open the file using OPENROWSET through the ACE Drivers.


#7

Yes, I should have tried that. The ones I looked at were just a few words in all caps though, hard to see that there was anything tricky - although obviously something upset it. (I tried again at VARCHAR(8000), instead of MAX, in case that was iffy - same thing. Second file, Suppliers instead of Customers, had exactly the same thing with its NOTES column, so now I've calmed down I'll investigate a bit more :slight_smile: )

I'm disappointed that having set the column to IGNORE on Truncate/Error - heck even the whole batch - that didn't happen. "Some data" would have done me, for starters.

Yes, that's reasonable. An "All text columns NULL-if-blank" option would be nice, otherwise I would have to add a Preprocess step given that the crappy export put "" around every text column. At least when we export CSV we go to the trouble to only quote fields that themselves include an embedded "," comma. At that time I hadn't thought about NULL, we did it to "save a few bytes", because that was important back when I was a boy! and it was cheaper to fly to USA and buy some 4K RAM boards, including the cost of the flight, than buy them in the UK !!

Pity CSV ever made it to the light of day all those years ago. PIPE is worse, I've never seen any quote-delimiter used with that format, and we've had people insisting on that (for exported data, not imports thank goodness) when I knew PIPE was also used within the data - if I look at the EMail column of client ecommerce databases there will be some in there, I've seen some like this:

||||NO_SPAM||||@getlost.com

which gets through our email address validation no problem! and will screw up the 3rd party's Import royally - good job too, didn't like the arrogance of that company.

That works - but only now that I have your code. I'll keep that handy for next time. Pity I don't have to do this more often, I would then have incentive to build something handy. Somewhat surprised that my Google didn't turn up "MSSQL CSV import is usless and here's a handy all-singing-and-dancing replacement"

Kicking myself now, when I put it into Excel, and it came in clean as a whistle, I had that thought. Something else came up and when I got back to the project I had forgotten that thought. "It's me age, you see" :slight_smile: Although, that said, I had half hoped that having built a CSV import I could then reuse that as a temporary solution for repeated imports of the data until a bit later in the project.

I'll go back and try that and see if it sheds any light on why the NOTES failed.

Not tried that, that might well be handy in future - I'm as likely to get an XLS as a CSV with the "Just chuck this in a database for me pls" request !

There must be loads of situations where "Chuck this data in and create a table" applies, and making that a Doddle to do would be really good for the marketing of SQL Server.

One piece of better news ... well ... sort of.

I've found out why the column names have been created quoted. I had a second CSV file to import and must have done something different, as those column names came in WITHOUT quotes. My guess is:

First time: Chose Flat File, pressed NEXT and saw the preview, along with everything quoted.

Pressed BACK and put the " quote character in the "Quote Delimiter" box. Pressed NEXT

That took care of the Column Data, but not the Headings.

On the second file I already knew about Quote Delimiter, so I put that in before pressing NEXT.

So my guess is that you have to get that right-first-time (or abort/restart rather than just BACK and change that setting). Something to watch out for if you encounter that snag.

Thanks chaps.