TVF from flat file randoming moves data

scottdg · May 24, 2016, 8:04pm

I have a flat file that is correct and is used n a production environment with no issues. I am trying to import the same file through a stored procedure that calls a table-valued function. Random rows of data are askew when I view the table. It is always the same rows and it always starts with the same column. The positions and lengths are correct and the data is good. But some of the rows have a column that starts one column, or possibly two columns sooner than they should.

Just wondering if anyone has seen this issue before.

James · May 24, 2016, 9:16pm

The flat file is used to import data into a SQL Server table. Is that correct? I assume you have some method of determining the columns by a delimiter or fixed position. Getting data in to a table can be confounding at times and is usually caused by unexpected column/row delimiters or position assumptions.

I have seen issues with unprintable characters in the past cause these things but there are lots of other causes that may be causing this behavior.

One suggestions is to open the file in a text editor (if the size allows this) and look for any "non standard" characters where you are seeing the columns misbehave.

Can you post some sample data from the file including the rows that work and the rows that don't? Perhaps the code that imports the data into the table (stored procedure and TVF)?

scottdg · May 25, 2016, 12:10pm

It is a fixed width file. All of the positions are correct because the same file is used in a production environment. The test environment uses a new stored procedure that calls a function that reads the data into a TVF. If I pull the data from that TVF into a text editor I can see the random column shift.

Unfortunately I cannot post an of the data or code for you to see. I know that limits the amount of help you can provide.

The only other thing is the function uses sp_OACreate, and sp_OAMethod. I am not really familiar with those stored procedures and what they are doing.

JeffModen · May 25, 2016, 1:23pm

From your original description, there's a really good chance that your data contains out of place carriage returns (CHAR(13)) or line feeds (CHAR(10)).

Since you know what the last column is on the rows that act funny, do a search similar to the following...

 SELECT *
   FROM dbo.YourTable
  WHERE ThatLastColumn LIKE '%'+CHAR(13) + '%'
     OR ThatLastColumn LIKE '%'+CHAR(10) + '%'
;

JeffModen · May 25, 2016, 1:26pm

BTW... using sp_OA* for what you're doing is going to be comparatively horribly slow compared to using something like BULK INSERT because sp_OA* is truly "interpretive" for this type of thing. Another name for "interpretive", in this case, is RBAR on steroids.

I recommend looking into that and BCP format files. A BCP format file usually isn't necessary but it critical for files that have fixed length fields.

James · May 25, 2016, 1:26pm

The sp_OACreate/Method are OLE Automation procedures called from the procedure. Many people have those turned off for various security reasons.

So it sounds like some of the rows work and some do not. Remove the rows that don't and see if the data goes in correctly. Add one back in that was causing the problem and see if it does it again. Try opening the file in Excel or another spreadsheet program and use the same column definition spacing and see if that does the same thing with the columns shifting.

I agree with JeffModens suggestion. You can load a table with all data into one column then search for any non-standard values to find where it's causing grief. Lots of ways to inspect the data to find the culprit.

Kristen · May 25, 2016, 2:15pm

You can obfuscate the data - e.g. Find & Replace [A-Z] with "X" and [0-9] with "9", and change all the column and table names etc. in the code.

I would described it differently:absent sample code and data to work with people will have to guess, and the consequences of that are that they will find that much of their time is wasted; time here is given for free, it would help if you put some effort into providing some examples

You could try looking for non-standard characters e.g. using

WHERE SomeImportedValue LIKE '%[^0-9A-Za-z ]%' COLLATE Latin1_General_BIN2

include all valid / acceptable characters in the RegEx expression, refining them until the output you get doesn't appear to have any visible characters not in the RegEx set - what is, then, left will be extended characters of some sort, possibly "invisible" in editor being used.

One option, at that point, is to use the ASCII function, on each character in [SomeImportedValue] to see exactly what it contains

scottdg · May 25, 2016, 5:09pm

Thank you for the response. This was written by a vendor for our financial system and was using bcp. For some reason ($$$) it was re-written when upgrading to a service pack and now uses the read function.

I have queried the table and can see exactly where the problem is occurring by pulling it into excel or a text editor. The problem is the file that is being read is perfect. We are using the same file in our production system. So there is something happening in that read file using the sp_OA* procedures that I do not understand.

I appreciate you both trying to help given limited information. I think it is time to go back to the vendor and let him deal with it.

Thanks,
Scott