Firstly thanks for taking the time to read this and for any pointers you may have on this issue.
I’m very fresh into the world of SQL, and as part of my learning curve I decided to automate a report, that is normally generated manually through the front end of our Database.
The basic operation is as such:
A Powershell script is triggered, this runs an SQL script and the exports the results to a csv file.
The issue I’m having is that in data cells where “free text” entries are permissible via the front end user, there is so much “junk”, such as bullet points, hidden carriage returns etc where text has been copied and pasted into cells, that I modified the SQL code to deal with the issues which resulted in a perfectly laid out grid.
I then found out that when the csv is generated via powershell using sqlcmd, it’s encoded differently, which results in data being moved into the wrong places, I’ve modified the SQL code to deal with things in double quotes, bullet points, CR/LF etc to the point where data it’s now retaining data in the wrong cells, so the opposite to the original issue.
It ends up being a circle back to the start point again.
The ideal scenario would be to clean the data, but this wouldn’t be practice due to the shear amount of data in there, one of these reports contains circa 1.8M cells.
Thanks for your response, i've visualised the issue in the below screen grabs;
The first is when i run the script in MSSMS, then when finished, i export the grid results to a csv simply by right clicking the grid and selecting "Save results", when i open the CSV, everything is in its correct place, 100% correct.
The issue is, i need to automate this so someone with zero SQL knowledge can pull the data, i've tried this with Powershell using SQLcmd and also using BCP, neither get the desired result.
When automated, it pulls the data and saves it as a csv, but because certain cells contain "Junk", because the data in them is ultimately entered into the front end client as Free Text.
As you can see, in the above example, F2_DATA contains "Junk" which makes excel think the data should be in different cells, the same for F5_DATA, F6_DATA & F8_DATA.
The "Junk" could be anything, i've discovered Carriage Return, Line Feeds, Double Quotes, Bullet Points, i even found certain strings such as "double space{period} double quote to ask as CR"
The standard would be to clean the data, but that simply isnt an option due to the amount of it, in a single export as described, there would be anywhere between 1.2-3.4million cells.
To counteract the issues, i've had to modify the SQL Script to include REPLACE and CAST to ignore certain CHARACTERS and strings, then im faced with a further problem, in that it ignores things too well, resulting in the below, whereby rather than the bad data expanding and pushing its data outside of its own cell, its now pulling data from outside and ignoring the delimiters needed by excel to apportion the data in the correct location, note that there are commas pulled in that one would assume should act as the delimiter.
if ($tempFile) {
ConvertTo-CSV -inputFile $tempFile -outputFile "$outputPath$outputFile"
} else {
Write-Host "An error occurred while running the SQL script."
exit 1 # Exits the script with an error code following failure
}
Remove-Item -Path $tempFile -Force
Firstly it runs and waits for the completion of the SQL Script, writes the output to a temporary file whilst extracting, it then converts it to csv in UTF8 encoding, checks all rows are correct, removes empty rows then saves as csv to the designated output location, then removes the temp file.
Without your SQL script, consumable test data etc, a full answer cannot be given.
The basic problem is exporting from the SSMS grid produces the CSV with double quotes around strings while sqlcmd, bcp etc do not by default. (You can check this by using notepad to open the CSV instead of Excel.)
Export-CSV in the powershell I gave you also puts double quotes around strings so this what I would be inclined to use.
If you do not want to call powershell from the SQL Agent you will need to install the sqlserver module. Run powershell as an administrator and run:
Install-Module -Name SqlServer
You will then need to convert your sql script to a VIEW or stored procedure so you can run something like:
If you are not using Windows Authentication you should probably also look at the following. It shows how to save a password as an encrypted string in a file so it is not plain text in the script. The password is in effect encrypted with the user's Windows credentials.
Thanks again for the advice on this, and for the pointers regarding securing those credentials, that’s not something I’ve used before, but will certainly be employing from now on!