Create loop to extract batches of records and push to other server

Hi. I have a challenge at work to move a very large table ( 2 bill recs /40 cols) to another server for archive purposes. ( I cant use any SSMS wizard to do this sicne the server cant export to Azure Synampes dbase) The source server is limited on space so even breaking the table up into samller chunks to push to target table keeps on failing in SSIS. My manager wants me create a SSIS job that extracts batches of 100,000 rows at a time from the source table and push each batch to a target table on another server using a dataflow task. Im pretty new to SSIS so im pretty clueless. Ive seen many videos on how to loop things within SSIS tasks but nothing for how to extract rows from
(NB please dont suggest any third party apps . All i have is SSMS and SSIS )

In my head i want to create a query /stored proc within an Exec SQL task to set up a loop, but i need a data flow task to somehow be part of the loop which i cant get my head around. See below code . any help on how to do it this way would be greatfully recieved :slightly_smiling_face:

DECLARE @rownum INT= 1;

DECLARE @batch_size INT=100000;

WHILE @rownum>0 
BEGIN 
 SELECT <list of cols IN source TABLE> 
 FROM ( SELECT *,
               row_number() over ( ORDER BY target_date, claimid, partyid) AS rownum 
        FROM visualfiles.history_daily )a 
  WHERE rownum>=@rownum AND rownum< @rownum + (@batch_size -1)) 

ORDER BY rownum 

SET @rownum=@rownum +@batchsize --ready for next iteration

<-----NEED TO SOMEHOW INCLUDE DATAFLOW TASK HERE TO PUSH EACH ITERATION TO TARGET TABLE ??

END

I think the approach is wrong. As this is archive data I would look at exporting the table to a Parquet file. As a compressed column store format it will save a lot of space. Azure Synapse, SQL Server with Polybase, Fabric and lots of other environments are able to read a Parquet file as an external table.

Azure Data Factory (ADF) can create Parquet files or you could use one of the Python libraries. (I know, neither SSMS nor SSIS but it might save a lot of long term aggrivation.)

1 Like