Importing multiple files to the same SQL table

wheddingsjr · August 27, 2021, 11:47am

HI all

I have a two part question I hope someone could assist me with.

Part 1
I have multiple .txt files I am loading into a SQL table, there is no indicator within the data as to which file is which (ex. file1, file2, etc.). The first file went through perfectly but the second had an error but still loaded some lines. I know the line count of the first file, so the question is can I delete all the rows that come after the last line of the first file or does the new data from the second file just get mixed in with the existing data?

Part 2
To avoid this in the future, is there a way to add a column to the text file that indicates which file it is (for example adding File# column at the end)? That way I can delete rows that are in specific file number if I need to. Unfortunately the files are too big for me to open to add the column manually

Thanks in advance

mike01 · August 27, 2021, 1:37pm

This is a simple process by using the FileNameColumnName property. This outlines it pretty well

wheddingsjr · August 27, 2021, 1:59pm

Thanks Mike..I will take a look at that. You have any opinions about part 1?

wheddingsjr · August 27, 2021, 5:27pm

Mike I checked out the link, bit I am not that advanced in SQL, but I did try to find other articles that were a little less advanced and apparently all those seem to be targeted for dba's and other advanced users as well. Trying something on my own, I was able to add a new column (see below)

As you can see, when I got to the data source page I added a column and named it FileName but I didn't know how to get it to use the files name to populate the field. When I run it I get an error during execution.

jeffw8713 · August 27, 2021, 6:31pm

It looks like you are using the import wizard from SSMS to import your files. If that is the case - what you should do is build the import from the wizard, but instead of running it immediately - save it as an SSIS package.

Once you have that - you can create a project in SSDT and add that package. In SSDT you can then modify how this works. In SSDT you can add a derived column transformation where you add a derived column with the data you want included.

If that is too much (for now) - then import each file into separate staging tables. Once both files have been imported - you can then write a query that includes your additional column(s) - then use UNION ALL to combine each table into a single result.

wheddingsjr · August 27, 2021, 6:54pm

Thanks Jeff, and Yes I am using the wizard. That is only the half of it though..I have an encrypted external hard drive that only allows me less than 3 minutes to get things running after I put in the password. If the files are idol for 3 mins. or less the hard drive locks down again. I actually have 20 45GB files that I need to load. This problem came about as I loaded the first file without an issue yesterday afternoon. Started the 2nd one before I went to sleep (that's how long each file takes to run). When I woke up I saw that the file had stopped loading but still loaded some items to the table. Because I didn't have the file names I had to delete all 108 million lines and start over instead of just backing out the data from the second file. I am not sure if I would have the time to save it as a SSIS package. Also, is SSDT something I have to get and download?

jeffw8713 · August 28, 2021, 4:09pm

Even more of a reason to save the package - and run it from SSDT. Yes - you have to download Visual Studio and enable the data services tools, but that is a free download (Community Edition). Once installed - use the Visual Studio Installer to modify and enable and install the SSIS tools. You then create a project using the Integration Services template and add the saved import/export package to that project.

Once you have that - it is very easy to make changes and for new files you can just copy/paste to a new project. It saves time from having to reconfigure everything and then you can run the package directory from SSDT.

yosiasz · August 28, 2021, 4:49pm

You can also try this import using powershell or python

wheddingsjr · August 30, 2021, 10:48pm

Thanks Jeff

I was able to download Visual Studio and there are some things that I am not sure of but I first wanted to take your suggestion and build the import from the wizard and saving it as an SSIS package but I am getting the following error:

jeffw8713 · August 30, 2021, 11:00pm

Save it to a file, then in your project add an existing item and select that file.

wheddingsjr · August 30, 2021, 11:10pm

Ok..its working..but do I have to save all 21 files with a password?

yosiasz · August 30, 2021, 11:27pm

1 single SSIS package that loops through a folder, grabs name of file dynamically then imports the data into 1 table.

If you think about it, you would not want to create 150 ssis packages if you had 150 files right?

Look at this if you want to do it using SSIS.

https://www.sqlshack.com/import-data-from-multiple-excel-files-in-sql-server-tables-using-an-ssis-package/

wheddingsjr · August 31, 2021, 12:15am

Thanks @yosiasz that's another very complicated procedure. My Visual Studio doesn't even look close to what's in that article and as stated before its . I appreciate the help from everyone but I am going to just try to figure something else out. The article says in its introduction that is a true friend for DBA and developers. I am neither of those and not at an advanced level. But again, thanks for the help. I got a few ideas from all the comments.

yosiasz · August 31, 2021, 12:59am

I am moving a lot of our ssis imports to powershell and even python. Avoids, for me, this whole ssis, friendly ui, but too complicated