SQLTeam.com | Weblogs | Forums

Data Import - Flatten Child Table


#1

I want to import some data from a 3rd Party database.

It has a Child Table for "Parties" associated with a Matter table. Even if there can only ever be one party, such as Client, the Client Name ID is still stored in the Party child table. Other Party Types, such as Debtor, allow a maximum of two entries per Matter, and there are other things, such as Interested Parties, Licensees, Owners etc, which allow unlimited values.

(There is a separate Party Type Definition Table which has a "Max allowed" column).

There is an Item Number column in the Party table, which lets me decide which is the First Debtor. I am not certain that the Item column is guaranteed to be consecutive numbers 1, 2, 3, ... thus I am assuming that the First Debtor has the lowest Item No, rather than "1".

My Import routine requires a View to provide all the columns that it needs to import from i.e. matching the Target Table (and from there the import routine decides which rows have a modified-value more recent than "last time", and data validation issues, and so on).

I'm bothered that my approach is very inefficient! Here's what I have got:

Create tables for Matter, Name and Party:

CREATE TABLE dbo.TEMP_Matter
(
	M_ID	int NOT NULL,
	M_Title	varchar(30) NULL,
	PRIMARY KEY
	(
		M_ID
	)
)
GO
CREATE TABLE dbo.TEMP_Name
(
	N_ID	int NOT NULL,
	N_Name	varchar(30) NULL,
	PRIMARY KEY
	(
		N_ID
	)
)
GO
CREATE TABLE dbo.TEMP_Party
(
	P_M_ID	int NOT NULL,
	P_N_ID	int NOT NULL,
	P_Type	char(1) NOT NULL,
	P_Item	int NOT NULL,
	PRIMARY KEY
	(
		P_M_ID,
		P_N_ID,
		P_Type,
		P_Item
	)
)
GO

Populate with some data

INSERT INTO dbo.TEMP_Matter
VALUES(101, 'MatterA'),(102, 'MatterB')

INSERT INTO dbo.TEMP_Name
VALUES(201, 'ClientA'),(202, 'ClientB'),(203, 'ClientC'),(204, 'ClientD')

INSERT INTO dbo.TEMP_Party
VALUES
  (101, 201, 'C', 1)	-- Client : First Matter, ClientA
, (102, 202, 'C', 1)	-- Client : Second Matter, ClientB
, (101, 201, 'D', 1)	-- Debtor : First Matter, ClientA
, (102, 202, 'D', 1)	-- Debtor1 : Second Matter, ClientB
, (102, 203, 'D', 2)	-- Debtor2 : Second Matter, ClientC
GO

Create a VIEW to display the First Party, for a given Matter

CREATE VIEW dbo.TEMP_Party_First_View
AS
SELECT	P_M_ID,
	P_N_ID,
	P_Type,
	P_Item
FROM	dbo.TEMP_Party AS P1
WHERE	P_Item = 
	(
		SELECT	MIN(P_Item)
		FROM	dbo.TEMP_Party AS P2
		WHERE	    P2.P_M_ID = P1.P_M_ID
			AND P2.P_Type = P1.P_Type
	)
GO

I used MIN as I assumed it would be more efficient than ROW_NUMBER(), but clearly in order to be able to get Second/Subsequent parties I would plan to use ROW_NUMBER()

Then create a view to get the Matter and the Parties that can be "flattened" (in my example assume that there is only one Client and a max of Two Debtors

CREATE VIEW dbo.TEMP_Matter_Import_View
AS
SELECT	M_ID,
	M_Title,
	[M_N_ID_Client] = Client.P_N_ID,
	[M_N_ID_Debtor1] = Debtor.P_N_ID
FROM	dbo.TEMP_Matter AS M
	LEFT OUTER JOIN dbo.TEMP_Party_First_View AS Client
		 ON Client.P_M_ID = M.M_ID
		AND Client.P_Type = 'C'	-- "Client"
	LEFT OUTER JOIN dbo.TEMP_Party_First_View AS Debtor
		 ON Debtor.P_M_ID = M.M_ID
		AND Debtor.P_Type = 'D'	-- "Debtor"
-- Going to need to use something different to get the Second Debtor!
GO

Display the data

SELECT	'Matter', *
FROM	dbo.TEMP_Matter

SELECT	'Name', *
FROM	dbo.TEMP_Name

SELECT	'Party', *
FROM	dbo.TEMP_Party
	JOIN dbo.TEMP_Matter
		ON M_ID = P_M_ID
	JOIN dbo.TEMP_Name
		ON N_ID = P_N_ID

SELECT	'Matter_Import', *
FROM	dbo.TEMP_Matter_Import_View

I'm particularly bothered that the TEMP_Party_First_View has to be LEFT OUTER JOIN'd multiple times, once for each Party Type. There are about half-a-dozen Party Types that only allow a max of one value, and a few more than allow a max of Two - both of which I would like to flatten on Import to speed the reporting side (not having to JOIN to Party Child Table to find things for which only one value can exist).

EDIT: The Item numbers are definitely not suitable for use as consecutive 1,2,3 sequence as (I had forgotten) there is an ExpiryDate column in the Party table, and I only want to use rows which have not expired.


#2

I thought I would have a go with a Pivot. I figured I could use a ROW_NUMBER() to get the ItemNo (base 1, contiguous) and append that to the Party Type to get C1 (for Client) and D1, D2 (for Debtor) and then Pivot that.

Here's my attempt:

; WITH Party_CTE
AS
(
	SELECT	P_M_ID
		, P_Type
		, P_N_ID
		, [T_RowNumber] = ROW_NUMBER()
			OVER
			(
				PARTITION BY P_M_ID,  P_Type
				ORDER BY P_M_ID,  P_Type
					, P_Item
			)
	FROM
	(
		SELECT	P_M_ID
			, P_Type
			, P_Item
			, P_N_ID
		FROM	dbo.TEMP_Party AS CN
		WHERE	CN.P_Type IN('C', 'D')
	) AS X
)
SELECT	P_M_ID
	, [C1]
	, [D1]
	, [D2]
FROM
(
	SELECT	P_M_ID
		, [T_nat_Code] = P_Type + CONVERT(varchar(20), T_RowNumber)
		, P_N_ID
FROM	Party_CTE
) AS C
PIVOT
(
	MIN(P_N_ID)
	FOR	T_nat_Code IN ([C1], [D1], [D2])
) AS PivotTable

Results:

P_M_ID      C1          D1          D2          
----------- ----------- ----------- ----------- 
101         201         201         NULL
102         202         202         203

(2 row(s) affected)


#3

Results from real-world data

Old way - Matter Table joined to Matter2 then multiple Left Outer Joins to Party (for each NameType) and then a Left Outer Join to Names (for each Party Join)

Table 'Workfile'. Scan count 20, logical reads 560, physical reads 30, read-ahead reads 530
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0
Table 'Names'. Scan count 193941, logical reads 597408, physical reads 233, read-ahead reads 61
Table 'Party'. Scan count 97100, logical reads 360562, physical reads 3, read-ahead reads 6582
Table 'Matter'. Scan count 1, logical reads 703, physical reads 3, read-ahead reads 699
Table 'Matter2'. Scan count 1, logical reads 1110, physical reads 3, read-ahead reads 1106
   CPU time = 4829 ms,  elapsed time = 5834 ms.

New way - PIVOT and then Left Outer Join each Pivot column to Names

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0
Table 'Names'. Scan count 123538, logical reads 262812, physical reads 10, read-ahead reads 10
Table 'Party'. Scan count 1, logical reads 4394, physical reads 0, read-ahead reads 4394
Table 'Matter'. Scan count 1, logical reads 703, physical reads 3, read-ahead reads 699
Table 'Matter2'. Scan count 1, logical reads 1110, physical reads 3, read-ahead reads 1106
   CPU time = 1937 ms,  elapsed time = 2674 ms.

DBCC DROPCLEANBUFFERS & FREEPROCCACHE done before each test (I wanted to see what the Physical Reads were too)