SQLTeam.com | Weblogs | Forums

Update Huge table with SSIS is taking longer time


#1

Hi ,

I have a sql table with 1.5 Millilon records, recently we have created 2 new cols.
So, to populate those two cols we tried to run the job(ssis pkg which will do updates and inserts), this table is running since yesterday in job and not yet completed.

How to find out estimated time to complete this table?,
till now.. no of rows updated?
Best way to update the table in this scenario (new cols populate)?.

Can any one suggest me with best approach for above task


#2

Unless the inserts/updates are being done in small batches from the SSIS package, the entire operation will be treated as a single batch. Usually this takes longer, and is more resource intensive. Given that it is a single operation, I don't know of a way to find how much time is remaining.

A better approach would be to do the update/insert in small batches. If your database is in full or bulk-recovery, do frequent log back ups frequently in between (groups of) update/insert batches.

When you do the update/insert in a single operation, the entire operation will either succeed or fail as a group. But if you do it in small batches, half way through the insert/updates, you may find that there is a an error. Perhaps this is a desired outcome, or it may not be, but you need to be aware of that possibility.

Updating 1.5 million records is not that huge an operation unless your new columns are varchar(max) with lot of data that needs to be stored off-row. So a day seems rather long. You might want to check if something is blocking the operation. Run the system stored proc sp_who2 and look for the blk_by column to see what is blocking what. Also, you might want to install Adam Machanic's stored proc sp_whoisactive and run it to see what the database is doing.


#3

Thanks James,

It is not mandatory that i need to use SSIS, i can do this Through SSMS. What would be the best approach to update the table in batches.

Note : Source is view and Target is table in Same server but different databases


#4

you can just write a TSQL to perform the operation in batches and then if you want this to be done regularly,then schedule the TSQL via agent.


#5

Given that is is already running, and assuming that it is busy doing work, and not Blocked, then if you abort it now it may well take LONGER to roll back than it has taken to run so far.

Of course if it is going to run for a week that would be a good thing ... but if it is 99% completed that would be a pity.

Can you run something like this to see how much is done?

SELECT COUNT(*)
FROM MyTable WITH (NOLOCK)
WHERE MyNewColumn IS NOT NULL

#6

How to update table in batches ?


#7

check this
http://www.sqlusa.com/bestpractices/largeupdate/


#8

There was a discussion of performing a task in batches in this thread. In particular, look at the code @Kristen had posted. If the recovery mode is Full or Bulk-logged, you should also take log backups as the insert/updates progress.


#9

SSIS can insert/update in batches. On the OLEDB Destination set the batch and commit sizes and SSIS will perform a commit when you reach that number of rows.

If you deployed the package using project deployment and an integration services catalog you can view the active process from there. The other option would be to look in the tables in MSDB - but those can get a bit complicated to find the data you are looking for.