SQL - Select 14GB of Data into a table from staging table takes several hours

Rukky · July 4, 2021, 12:03pm

Hi Guys
Good day to you all and I hope you are enjoying the weekend.

Please I need your help on this topic. I have over 300 millions of record in Azure blob which I want to load into Azure SQL database. We don't have Azure Data Factory just yet.

I downloaded the csv file from blob to my location. Used SSIS to load the data from the location to Temp table which takes over 10 min.

I then used T-SQL to load the data from staging table to production environment after transformation. There is no index in any of tables ( staging and Destination) However, the insert takes forever, about 1 hours to load the data into prod table. I have used batches in 10,000 rows but, still no better performance.

Please can anyone advise the best strategy to load such amount of data from blob to staging and destination table with better performance outcome. I think, over an hour is too much.

Thanking you all in advance.
Rukky

harishgg1 · July 4, 2021, 1:50pm

third party software ..specifically designed for this purpose ( PAID or OPEN Source )

or

LIBRARIES .. in python ..

or

software which has this sort of thing in mind .. which you can use

yosiasz · July 4, 2021, 3:14pm

Please provide the destination prod tables schema and the transformation you used?

harishgg1 · July 4, 2021, 6:11pm

just a joke

Quantum Computer ( Rent it ) ( within NANO Second )

ScottPletcher · July 4, 2021, 7:11pm

Try a batch size like 100,000 rather than 10,000.

If you do download a physical file to reload into SQL Server, use native format, it will load faster.

Rukky · July 4, 2021, 8:16pm

Thanks Scott
I will increase my batch size. What do you mean by native format? Do I need to creat indexing in the destination table?
Thanks.

ScottPletcher · July 5, 2021, 2:18am

When you export data from SQL, it has an option for "native format". This format matches how SQL stores the data internally, so it's much faster to load, since the data is already in the proper binary format.

IF the existing table is clustered AND the new table is clustered the same way -- which presumably is true here -- then create the clus index before loading the table. Wait until after the table is loaded to create / recreate any nonclus index(es).

Rukky · July 5, 2021, 7:29am

Many thanks