How to insert csv file without double quotes

kanuto · August 12, 2015, 6:12am

Here's my current code:
CREATE TABLE tmp_OKFF
(
OKCCY VARCHAR(5),
OKBRNM VARCHAR(6),
OKONPX VARCHAR(17),
OKONPM VARCHAR(17),
OKFRX VARCHAR(19),
OKFPX VARCHAR(19),
OKMRX VARCHAR(17),
OKMPX VARCHAR(17),
OKSEQ VARCHAR(3)
)

BULK INSERT tmp_OKFF
FROM 'C:\sample.CSV'
WITH (FORMATFILE='C:\sample.fmt')

Here's the data I want to insert in my tmp_OKFF
"ABC","0001",-990039739 ,0 ,0 ,0 ,0 ,0 ,0
"ABC"," ",-3422054702 ,0 ,481385 ,0 ,0 ,0 ,0
"XXX","0001",0 ,0 ,0 ,0 ,0 ,0 ,0
"ASD"," ",0 ,0 ,0 ,0 ,0 ,0 ,0
"JED","0001",21644944 ,0 ,0 ,0 ,0 ,0 ,0
"FAQ"," ",74815363 ,0 ,0 ,11120 ,0 ,0 ,0
"PHP","0002",905175206 ,0 ,0 ,0 ,0 ,0 ,0

Here's the exact format:
12.0
10
1 SQLCHAR 0 1 """ 0 ""
2 SQLCHAR 0 3 "","" 1 OKCCY SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 4 ""," 2 OKBRNM SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 17 "," 3 OKONPX SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 17 "," 4 OKONPM SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 19 "," 5 OKFRX SQL_Latin1_General_CP1_CI_AS
7 SQLCHAR 0 19 "," 6 OKFPX SQL_Latin1_General_CP1_CI_AS
8 SQLCHAR 0 17 "," 7 OKMRX SQL_Latin1_General_CP1_CI_AS
9 SQLCHAR 0 17 "," 8 OKMPX SQL_Latin1_General_CP1_CI_AS
10 SQLCHAR 0 3 "\r" 9 OKSEQ SQL_Latin1_General_CP1_CI_AS

gbritton · August 12, 2015, 1:00pm

good discussion here:

Bulk Insert Correctly Quoted CSV File in SQL Server

Google is your friend!

Kristen · August 13, 2015, 6:33am

Interesting article, thanks. Is "OpenRowSet with the CSV text file data provider" the only answer then? Seems pretty shocking that BCP et al cannot handle a "properly" built CSV file - quotes around strings that contains commas, and doubled-up-quotes in fields that are surrounded by quotes. Absence of that is going to drive people to Oracle, eh?! CSV, done that way, is the only delimited file specification that I know of that allows for the delimited to be included/escaped within the data itself and as such ought to be considered important by MS, surely?

I wondered about importing the file to a single-column table and then parsing it from there, but efficient splitter functions for single-character delimiters are one thing, I imagine that building one that handles escaped delimiters would be much more difficult.

If performance is not critical that might be an answer though.

"Google brought me here and then I got a First Class service" - is that the sort of thing you were thinking?

gbritton · August 13, 2015, 12:53pm

He He!

I suppose if it were me, I'd do it in SSIS with a Script Source component to do the parsing. Even SSIS does not understand a CSV file where double-quotes are added "as needed" IIRC. The fields are either quoted in every record or they're not.

Kristen · August 13, 2015, 2:14pm

Just out of curiosity - does it then properly handle both 1) embedded commas and 2) doubled-up (escaped) quotes?

It might not be THAT hard to pre-process the CSV file to add quotes to any columns that only have them "as needed"

gbritton · August 13, 2015, 2:20pm

I'll have to try the doubled quotes, Note sure at the moment.

gbritton · August 13, 2015, 2:46pm

OK ... tried it with a test.csv file like this:

"test1","test2""test3","test4"

Result:

Column 0	Column 1	Column 2
test1	         test2""test3	test4

So, it doesn't handle escaped data

JamesK · August 13, 2015, 4:04pm

I tried with the a csv file that has the following:

a,"b","c,d","e"f",g,h
1,2,3,4,5,6

Using Import/Export Wizard on SQL 2008R2, this is what I got:

So my conjecture is that as long as it is not followed by a comma, the text qualifier is considered as data, and no need to escape the text qualifier. But, it also gave me the following warning

So I accept defeat - I have no idea why it complained about the data, and then loaded it anyway.

Kristen · August 13, 2015, 4:11pm

needs to be

"e""f"

I think ??

Kristen · August 13, 2015, 4:12pm

This would also be valid data for that (single) column

"e"",""f"

JamesK · August 13, 2015, 4:33pm

Yes, Kristen, I see your point. I guess that warning about text qualifiers not being supported is indeed the case although in some simpler cases, it goes through without the promised failure.

Kristen · August 13, 2015, 5:17pm

That's perhaps even worse ... if someone's tests ONLY include the "simpler cases" then they will be lead to the false sense of security in thinking that CSV does support the full-fat version

Can't help thinking it is high time that MS fixed this ...

Could well be that, back in the 70's, MS invented the full-fat CSV format. That would be ironic!!

Kristen · August 13, 2015, 5:44pm

But, according to Wikipedia, it isn't the case

"the IBM Fortran (level G) compiler under OS/360 supported them in 1967"

even I had't (quite!) started coding by then, although Fortran was the first thing I learnt. On punch cards, no less On a computer the size of a large room which had 4 tape drives and 16K (words) of memory but a (line) printer so fast that one of the first programs I wrote was to calculate the speed that it got through fan-fold paper in miles-per-hour!! I also remember writing a "database" program giving a star rating, amongst other attributes of the establishment, to the barmaid in every pub in the town - one punch card per pub. As students we debugged this application by visiting every pub (over 100 of them ) in the town to make sure that the star rating was correct ... Happy Days!

JamesK · August 13, 2015, 6:32pm

Indeed! This is really dangerous.

JamesK · August 13, 2015, 6:33pm

And once you got to the bar, what was the testing procedure?

On second thoughts, don't answer that. Mrs. K might be reading SQL Team forums

Kristen · August 13, 2015, 7:16pm

Drink 6 pints and then award the barmaid - or barman, couldn't tell the difference by then! - five stars

gbritton · August 13, 2015, 8:18pm

Next time I have to do this, I'll write a script source component and use something like this:

Parse CSV

JeffModen · August 15, 2015, 2:17am

@kanuto,

Like I said over on the "other" forum, you have to NOT size any of the columns other than the first one which should be sized to "1" and with NO DELIMITER. You also forgot to include a fictitious column name (I used NOTUSED) for the first column to be ignored (and THAT's what the real problem was). I tested the following format file with the data you provided and it works just fine.

You also don't need all the collations. If you leave it out as just "", it'll still work fine because you don't normally need to change it. It's a rare case when you do need to change it.

I also changed from 12.0 to 9.0 for the BCP Format file type. It shouldn't matter to you, though.

9.0
10
1  SQLCHAR 0 1   ""      0 NotUsed "" 
2  SQLCHAR 0 500 "\",\"" 1 OKCCY   SQL_Latin1_General_CP1_CI_AS
3  SQLCHAR 0 500 "\","   2 OKBRNM  SQL_Latin1_General_CP1_CI_AS
4  SQLCHAR 0 500 ","     3 OKONPX  SQL_Latin1_General_CP1_CI_AS
5  SQLCHAR 0 500 ","     4 OKONPM  SQL_Latin1_General_CP1_CI_AS
6  SQLCHAR 0 500 ","     5 OKFRX   SQL_Latin1_General_CP1_CI_AS
7  SQLCHAR 0 500 ","     6 OKFPX   SQL_Latin1_General_CP1_CI_AS
8  SQLCHAR 0 500 ","     7 OKMRX   SQL_Latin1_General_CP1_CI_AS
9  SQLCHAR 0 500 ","     8 OKMPX   SQL_Latin1_General_CP1_CI_AS
10 SQLCHAR 0 500 "\r"    9 OKSEQ   SQL_Latin1_General_CP1_CI_AS

JeffModen · August 15, 2015, 2:30am

BTW. If you see leading double quotes on the first column in your data, then you'll need to change the delimiter on line 10 of the format file to this...

9.0
10
1  SQLCHAR 0 1   ""      0 NotUsed "" 
2  SQLCHAR 0 500 "\",\"" 1 OKCCY   SQL_Latin1_General_CP1_CI_AS
3  SQLCHAR 0 500 "\","   2 OKBRNM  SQL_Latin1_General_CP1_CI_AS
4  SQLCHAR 0 500 ","     3 OKONPX  SQL_Latin1_General_CP1_CI_AS
5  SQLCHAR 0 500 ","     4 OKONPM  SQL_Latin1_General_CP1_CI_AS
6  SQLCHAR 0 500 ","     5 OKFRX   SQL_Latin1_General_CP1_CI_AS
7  SQLCHAR 0 500 ","     6 OKFPX   SQL_Latin1_General_CP1_CI_AS
8  SQLCHAR 0 500 ","     7 OKMRX   SQL_Latin1_General_CP1_CI_AS
9  SQLCHAR 0 500 ","     8 OKMPX   SQL_Latin1_General_CP1_CI_AS
10 SQLCHAR 0 500 "\r\n"  9 OKSEQ   SQL_Latin1_General_CP1_CI_AS

if you get some other error, then you might night to change that last delimiter to just "\n".

Remember that you should always make the column size bigger than the column size in the table if you're using delimited fields. That will allow your table to fail the import as a part of error checking just by nature of the size of the data coming in.