Daily Schedule For ssis Package

mannesravya · June 25, 2019, 9:30pm

Whats the best practice for daily schedule of SSIS Package.I cant truncate as data is huge.If i truncate and load again, the package will take more than one hour.Also there is no primary key. I cant write merge statement. In my current package i just got max(start_time) of target table into a variable and assigned that in dynamic sql in where clause of source table. Intially it worked well.But now the package is missing some rows. When i tried to debugg the package, i can see the max(start_time) in a variable, but the package is not inserting all the records into target from that maxdate.

yosiasz · June 25, 2019, 10:07pm

Do you have any audit columns in the source table(s) such as modifieddate/createddate etc?

jeffw8713 · June 26, 2019, 6:07pm

How do you determine - from the source - which rows need to be inserted/updated? And once you have identified those rows, how do you identify whether or not that row is going to be inserted or updated in the destination?

You stated you don't have a PK - but do you have anything that uniquely identifies the row?

If there is no way to uniquely identify a row then there is really no way you can identify rows to be extracted and subsequently inserted or updated in the destination. That will require a full refresh every time (truncate and load).

mannesravya · June 26, 2019, 8:53pm

Yes. its Update_date

mannesravya · June 26, 2019, 8:54pm

But source contains millions of records.If i truncate and load package would run for hours.And the table has no primary key. Records are repeating.

mannesravya · June 26, 2019, 8:56pm

But i am pulling records based on change_datetime. Its in history table where new records are also inserted and also they are maintaining changed records in this table in the form of line items.Suppose if same record is updated 4 times line items would be 1,2,3,4.

jeffw8713 · June 26, 2019, 10:07pm

There still has to be something that uniquely identifies that row...just because there isn't a PK defined doesn't mean there isn't a set of columns that uniquely identify that row.

If this is an audit table and each audit line item is incremented by 1 - then you PK is going to be the itemid, the line items and the change date/time.

You need to determine what these values are first - and once you have that defined you can build a process to insert/update based on those values.

In your original statement - you said the problem is that 'some' rows are 'missing'. How can you be certain you are pulling all relevant rows if there is no way of identifying those rows?

mannesravya · July 2, 2019, 8:34pm

Somehow i have identified the list of columns that uniquely identify the records.

jeffw8713 · July 2, 2019, 10:06pm

So did that resolve the issue?

mannesravya · July 11, 2019, 3:38pm

yes the issue is resolved