SQLTeam.com | Weblogs | Forums

Import all files in repository to single table?


#1

Good day - extremely new to using SQL so I apologize in advance for any silliness in my query. Since I don't know much about SQL, I can't really say if this is even possible, but from what I've read, I believe it should be.

We have developed an automated process whereby spreadsheets are distributed to anywhere from 3-10 people every day. Those people do some stuff in their spreadsheets, and save them to a repository. I currently have a VBA macro that collects the data from each of those sheets by looping through the repository until all the files are accounted for. It then moves them to another directory where they get dumped into a zip file (manually). I realize that it would make more sense to use a web front end or even a distributed MS Access form, but the requirements I have to work with are for Excel spreadsheets.

It seems to me that SQL is a LOT faster and more reliable. I'd like to set up a similar process, but not really sure how to get it to access each file (the number of files can vary from day to day).

The columns (there are only 12) are always in the same order, which replicates the db table. Each sheet contains about 300 lines, but that will be increasing to 1000 over the next two months.

Is this something that can even be done without having to manually import one file at a time, using the file name?

Thank you for any help!
JP


#2

Just because you can do it in SQL doesn't mean you should! Some questions:

  1. Do you have a SQL Server instance running somewhere that you can use to develop your solution?
  2. Do you want to store the spreadsheet data in tables in SQL or simply have SQL manage the files using FileTable or FileStream?
  3. Given the small size of the spreadsheets, why do you bother to zip them?
  4. Do the file names follow any naming convention?
  5. Have you considered using Integration Services?

FWIW I'm wondering if using SQL to do this job might be overkill.


#3

GBRITTON,
Thank you so much for the reply and questions. Perhaps it is overkill - I'm just looking for a solution that will remain as effective when we have 5 people generating 300 lines as when we we have 20 generating thousands (potentially).

For your questions

  1. Yes, I have SQL Server running, and use SQL Server Management Studio 2008 R2 on the front end
  2. I just want to store the spreadsheet data in my table - but honestly I'm not sure what SQL can do to manage the files (will look into FileTable and FileStream to get a better understanding)
  3. I zip the files because I don't like to leave "lose" files that I'm not actively using on a network repository unzipped - just doing my part to keep space free for the 2,000 other people I share them with! But it's not a requirement at all.
  4. Yes, I name the files "File-MMDDYY-HHMM.xlsx"
  5. No, I've never heard of it, but will check it out for sure!

Before one of my former co-workers left the company, he noted "you should move all of this out of VBA and let SQL do it" which, because of his brilliance, I assume is possible, I just have no idea how to (or as you point out, if I even should!)


#4

Might be swings and roundabouts ... but my primary concern would be data validation - any sort of APP could check that the data being entered into a field was "reasonable" whereas, in the main, Excel doesn't care and will let the user type in any old rubbish - invalid dates, indeed people's names, control-characters (such as Tab or Line Break) and so on can all be entered into a cell intended for a date (unless you go to the trouble of adding validation on every cell in the Spreadhseet, and its a tedious job in Excel - particularly if you are trying to validate something like a Customer ID where new customers can be added periodically). When such unvalidated data is processed it causes havoc downstream. With the best will in the world the best operators will make a mistake at some time, or a new operator will misunderstand what they are supposed to put into a particular field, or cut & paste something from, say, Word which has an "em-dash" instead of a hyphen (and as such looks just-fine but is actually a weird character as far as the APP is concerned ... and so on.

My two-pennyworth would be to convert the Spreadsheet stuff into a form, and then have users type into the form instead. You can save their Form Data to a CSV file to start with (and process / aggregate that in Excel pretty much how you do now, to start with) and then you can migrate to SQL (or start with SQL at the outset, depending on time, inclination and initial skill levels).

But for me the most critical part of the process will be getting the data from the users' fingertips to be consistent, clean and validated. Did I say that already?!!


#5

Kristen, thank you!

I agree 100% and am doing exactly that in VBA when they save the file to the repository. There are checks for properly formatted data (dates & character types) and a few logic checks.

I initially had a form pop up when they opened the file, but they found it a bit tedious, so we went with straight entry to a spreadsheet with data validations/combo boxes set for the bulk of the data they enter.

" and then you can migrate to SQL" - that's the part I'm struggling with :slight_smile:

Thank you again for taking the time to read and respond!


#6

I expect the users would appreciate / prefer getting an error message "At the time that they type [invalid data] into a field" - the source document / whatever is in front of them at that time, so correcting the value is easy whereas with any post-processing they have to go-get the original document (or "make up" some data that gets then past that hurdle which, sadly, happens in some cases :frowning: )

Perhaps look for a Forms Tool that deals with the storage for you, but lets you concentrate on the APP part? I'm not familiar with many such things, other than Web Apps and HTML forms, although I think that there might be a Microsoft Forms thigie (which probably uses VBA or something like it).

Alternatively MS Access has a dialect of Basic for its application language, and a half-decent database behind it - with easy upscale to SQL, or just "link through" to SQL tables [i.e. you can use SQL tables instead of using any local / native Access tables]. Access also has good Forms capabilities, and a lot of the validation etc. relevant to database programming is right there at the click of a right-mouse-button :sunglasses: For example: "THIS field needs to be validated against THAT database table" will cause Access to provide either a dropdown list (using a suitable column from THAT table for the picklist values) or a "Type and Search" field (predictive-text type of thing), and its relatively easy to have "This second picklist is dependent on what they type into the first picklist" - i.e. things that are pretty tricky in Excel.

If you haven't tried Access it might be worth a look - pretty easy to "get going" and lots of full-blown programming stuff when you need it. My only gripe is that its SQL is non standard, so when you upsize to MS SQL things get in your way - like using "*" for wildcards (well known to ordinary users) instead of "%" which SQL uses (and is totally obtuse to users!!). If you decide Access IS the answer for you I would consider using SQL Tables from Day One, and not using any local/native Access tables at all - that will solve the problem by avoiding you having to Upsize at a later date.


#7

Sorry for the delayed response - family issue has kept me away for a bit and would hate you to think I'm not appreciative. Thank you for all of your time and suggestions!