Excel blobs and BCP extract commands

Hi everyone, I have been asked to investigate storing Excel files in Sql Server as a BLOB. These files could be of any size and I will not know what headers they will have beforehand. They could have multiple sheets per Excel. We may need to re-constitute the file/s promptly at any stage.

I haven't used blobs before so in an effort to get my head around this I created a test database called: "Scratch" and a table inside it called: "TestBlob". Here is the table structure:

CREATE TABLE  dbo.TestBlob  -- Table where BLOB will be stored 
( 
     tbId   int  IDENTITY(1,1) NOT NULL, 
     tbName   varchar (50) NULL, 
     tbDesc   varchar (100) NULL, 
     tbBin   varbinary (max) NULL 
)

I was able to load a simple Excel file (literally 1 column)

Insert TestBlob(tbName, tbDesc, tbBin) Select 'C:\Test_Excel_Files\simple1.xlsx','Files',
    BulkColumn from Openrowset( Bulk 'C:\Test_Excel_Files\simple 1.xlsx', Single_Blob) as tb

Extracting it has proved to be more problematic. I used a BCP command I found online and modified acordingly. It needs a format file also (which I called simple.fmt).
Here is the code I used:

DECLARE @sqlExtract VARCHAR(1000);
SET @sqlExtract = 'bcp "SELECT tbBin FROM Scratch.dbo.TestBlob where tbId = 1 " QUERYOUT C:\BlobTestOut\simple1Out.xlsx -T -f "C:\BlobTestOut\Formatfile\simple1.fmt" -S ' + @@SERVERNAME;
EXEC master.dbo.xp_cmdshell @sqlExtract;

The format file is below:

<?xml version="1.0"?>  
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">  
<RECORD>  
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="7"/>  
</RECORD>  
<ROW>  
  <COLUMN SOURCE="1" NAME="Salary" xsi:type="SQLSMALLINT"/>  
</ROW>  
</BCPFORMAT>

It produced an output but when I try to open it I get the error message: "Excel cannot open the file because the file format or file extension is not valid"

I would appreciate if you could indicate where I am going wrong or perhaps a better approach. I am conscious of the fact that in the future I will not know the header format of the Excel files so a format file approach could be useless and hence is there a better extract command?

Comments/suggestions/guidance greatly appreciated...
J.

The format file needs adjustment, the one described in here should work:

Wow. That's stellar - it worked. All I had to do was add a blank row to the end of the format file and re-jig the BCP syntax accordingly. Thank you again Robert, it output spedily and looks well-formed. Let me play away with this approach to get a hang of this technique...

Going forward this may be good for us but its on that I have to investigate all the way down stream...ie growth rates, backups etc...

If this is a new thing, and if you expect to have more than 10 thousand documents, or 10 GB of data, PLEASE DO NOT store them in blob columns in SQL Server. It will quickly become an administrative hassle.

My additional comments here:

The comment I made about PDFs not compressing also applies to XLSX files, they are already compressed, so your database backups will not get any benefit from backup compression.

Thanks Robert for the advice. Upon that note: I did an analysis on the existing documents for the past 10 years. Average file size: is 594KB.
Number of files and total yearly file weight is more or less going up ie for the years: 2019 to 2023:
Number of files loaded: 2256, 2018, 2278, 2199, 2426
Total yearly weight: ~1.2GB, ~435MB, ~582MB, 1.7GB, 2GB

For this year so far we are at 918GB so will on course to hit about 2GB by year end.

I think these numbers rule out the blob approach for permanent storage of Excel files as the limit you indicated will be reached in a few short years and hence become a headache.

The issue is that we want to make a permanent copy of the data being processed as the current process allows the user to alter the existing Excel file and this can and does cause problems. Hence, if we have a faithful facsimile of the original we can use to restore or issue resolve if it becomes necessary.

To be honest, I liked the blob technique and from what I saw of the restored Excel Robert its a faithful facsimile of the original (and was speedy ie I tested over a several examples). Hence, I was thinking of using this technique to temporarily store the excel in the table and then after EOB (say every day) restore elsewhere on a untouchable fileshare for a permanent record. Then scrub down the table at periodic intervals. Hence the table will never really grow very large. Just a thought...

Thanks for your guidance to date Robert. J.

Suggestion: Use COMPRESS on the BLOB column. You'll need to use DECOMPRESS when you read it but it'll be worth it.