SQLTeam.com | Weblogs | Forums

Retrieving and inserting data simultaneously, or not


#1

I have been tasked with coming up with a solution for retrieving invoice data from our own SQL database for importing into a 3rd party financial system. The financial software is likely to have a daily scheduled task that calls a view in our database to retrieve and import the necessary data. I am looking for some advice regarding how to ensure that the same records are not retrieved / imported multiple times. The source table does not have a 'posted' field that I can update (to mark records as already imported) and I am not authorised to make any structural changes to our existing tables. I was thinking that I could create an 'imported' table that gets populated with the imported records and add to the where clause of the view, "NOT IN (SELECT SOME_ID FROM IMPORTED_TABLE)". I realise I am not giving too much away but I thought there may be someone out there with experience of this kind of challenge who may be able to point me in the right direction.

Cheers.


#2

If the table has an identity column, or some other always ascending column value(s), you could store the last value loaded for each source table in a control table. Then, for the next run, only pull data from the source table where the ascending column value(s) are greater than those in the control table.


#3

We do this using Staging Tables which are a copy of the actual data.

We create the staging table with ONLY the columns that the 3rd party APP needs, and we insert/update ONLY rows that have changed in those columns using a large WHERE clause that checks every column's value.

We add two columns to the start of the staging table:

DateTime
Status - 1=Inserted/updated, 2=Deleted

We change the [Status] to 2 for any row (currently set to 1) where the PKey columns can no longer be found in the parent table. (And we change 2 back to 1 if the row is found again e.g. tomorrow). (We also set the [Status] to 2 if the record is flagged as Deleted/Inactive, rather than physically deleted i.e. anything which should cause the 2rd party APP to ignore that record)

We do not have separate [Status] values for Inserted/Updated as the remote end may not process an INSERT before we change it to an UPDATE, so we are basically only interested in having a way of telling the Remote end that a record has been deleted.

The remote end can store a "Last processed date" and process everything SINCE that date. However, beware that you might be adding rows to the staging table at the same time as the remote process is selecting them, so the remote process needs to be careful in that situation. Ideally you will update the staging table using an ATOMic process so that the remote sees all-or-none.

Our WHERE clause has two types of comparison, one for String and one for Numbers/Dates/Non-strings. Our String comparison is BINARY (Case/Accent/etc. sensitive) and includes leading/trailing spaces etc. comparison i.e. ANY change is significant.

WHERE
	...
	OR (
		   DATALENGTH(D.[StringColumn]) <> DATALENGTH(S.[StringColumn])
		OR D.[StringColumn] <> S.[StringColumn] COLLATE Latin1_General_BIN2
		OR (D.[StringColumn] IS NULL AND S.[StringColumn] IS NOT NULL) 
		OR (D.[StringColumn] IS NOT NULL AND S.[StringColumn] IS NULL)
	)
	...
	OR (
		   D.[NumericColumn] <> S.[NumericColumn]
		OR (D.[NumericColumn] IS NULL AND S.[NumericColumn] IS NOT NULL)
		OR (D.[NumericColumn] IS NOT NULL AND S.[NumericColumn] IS NULL)
	)

We use a rather crude means of updating the staging table: We delete any rows (in Staging Table) where the PKey exists AND one/many columns are different. Then we insert ANY row that is missing (i.e. this includes both New and Modified rows). Using MERGE would be better

Then we change the [Status] to 2 based on a PKey comparison

UPDATE	D
SET	[xxx_UpdateDt] = @dtNow,
	[xxx_Status] = 2	-- InActive / Deleted
FROM	dbo.MyStagingTable AS D
	LEFT OUTER JOIN MySourceTable AS S
		 ON S.PKeyColumn = D.PKeyColumn
WHERE	    [xxx_Status] <> 2
	AND (
		   S.PKeyColumn IS NULL
		OR S.IsActive = 0	-- e.g. if you have an Active/Inactive flag
					-- alternative to Row Delete
	)

#4

Thanks guys, you have given me food for thought. I need to put your ideas into practice and figure out the most effective solution.