Upsert in tables having large number of records

I have 40 tables having different structure in one of DB on one server that is being updated by data provider.

I want to create a SSIS package that would pull data from that data provider DB and insert ,update or delete (merge) data in to development ,Test,UAT and prod DBs.

The tables are having 1m- 3m rows and 20-30 columns each and all the DBs are on SQL Server platform and are on different servers.

The business requirement is to load data everyday on a particular time and have to use SSIS for this. I am new to SSIS and want your suggestions to create better design.

Depends a bit on what you mean by "have to use SSIS".
Usually with SSIS you would create a source and destination for each table to do the transfer - but for 40 tables with different structures that would be a bit much.
I would do that with a one of the (medium sized) tables to get an understanding of SSIS, what it can do and to check the connectivity and performance.
Also look at the source tables to see if there is a last updated date or timestamp which you can use to reduce the amount of data transferred.

Once you have an idea of the scale of the problem you can decide how to proceed. You can create a package or packages manually with a connections for each table but that would leave you with a manual maintenance issue.
I would be tempted to go for something more dynamic.
You could create the package or packages (I would go with a package per table) from the table structures - there are a number of methods of doing this - I tend to use SQL to create sql files but more natural is VB/C# or BIML or there are a number of other methods.

I've done something to recently to import into a generic staging table from any source - the first version used powershell but that has issues with large datasets unless you get a bit more involved in how thing are put together. It's pretty easy but more involved to do it in VBScript. Both of those need to execute an external task which can be from SSIS but doesn't really use SSIS. Maybe doing something from a script task in the package would be better.

That gets the data then you have to decide how to do the upsert - depends a lot on how much data you end up transferring. I would populate a staging table then use an SP to do the update as that's easier to debug and to log what is happening but you might want to do it in SSIS - in the dataflow or a script task.

1 Like

SSIS processes data row-by-row. That's going to be slow.

I suggest using the command line utility bcp. It's been several years, but I've used BCP (Bulk Copy Program) for million row inserts while normalizing data. Six million rows in <1 minute. SSIS can't do that.

can i use bcp inside ssis ?

SSIS is not going to be any slower than BCP - and if done properly will perform better than BCP because you can load data into the destination concurrently while reading the data from the source.

With BCP you have to export the data from the source into a file - then use BCP to load the data from the file - and BCP also reads the data row-by-row.