Whats the best practice for daily schedule of SSIS Package.I cant truncate as data is huge.If i truncate and load again, the package will take more than one hour.Also there is no primary key. I cant write merge statement. In my current package i just got max(start_time) of target table into a variable and assigned that in dynamic sql in where clause of source table. Intially it worked well.But now the package is missing some rows. When i tried to debugg the package, i can see the max(start_time) in a variable, but the package is not inserting all the records into target from that maxdate.
Do you have any audit columns in the source table(s) such as modifieddate/createddate etc?
How do you determine - from the source - which rows need to be inserted/updated? And once you have identified those rows, how do you identify whether or not that row is going to be inserted or updated in the destination?
You stated you don't have a PK - but do you have anything that uniquely identifies the row?
If there is no way to uniquely identify a row then there is really no way you can identify rows to be extracted and subsequently inserted or updated in the destination. That will require a full refresh every time (truncate and load).
Yes. its Update_date
But source contains millions of records.If i truncate and load package would run for hours.And the table has no primary key. Records are repeating.
But i am pulling records based on change_datetime. Its in history table where new records are also inserted and also they are maintaining changed records in this table in the form of line items.Suppose if same record is updated 4 times line items would be 1,2,3,4.
There still has to be something that uniquely identifies that row...just because there isn't a PK defined doesn't mean there isn't a set of columns that uniquely identify that row.
If this is an audit table and each audit line item is incremented by 1 - then you PK is going to be the itemid, the line items and the change date/time.
You need to determine what these values are first - and once you have that defined you can build a process to insert/update based on those values.
In your original statement - you said the problem is that 'some' rows are 'missing'. How can you be certain you are pulling all relevant rows if there is no way of identifying those rows?
Somehow i have identified the list of columns that uniquely identify the records.
So did that resolve the issue?
yes the issue is resolved