Create one table from two?

rasta_pickles · April 30, 2017, 9:34am

I have a table (Table A) which holds data from 2012 until 30th March 2017.

I have a second table (Table B) which holds data from 2014 until today.

Both tables are identically structured.

How do I create a single table which contains all data from 2012 until today? And how quick is the process likely to be (there are 160 million records in Table A!!)?

Thanks in advance as always.

Kristen · April 30, 2017, 1:28pm

If you create a new table then, presumably, the data on the original Table-A and Table-B will continue to change?

You could create a VIEW - dunno about performance though

CREATE VIEW dbo.MyView
AS
SELECT Col1, Col2, ...
FROM dbo.TableA
UNION ALL
SELECT Col1, Col2, ...
FROM dbo.TableB

All columns in both SELECTS must be the same data type (or implicitly convertible) but as the tables are identical that should be OK - barring any typing mistakes!

I recommend that you list all column names, rather than use SELECT *

Kristen · April 30, 2017, 1:30pm

If you need to know the "source" of a particular row you could also do:

CREATE VIEW dbo.MyView
AS
SELECT [Source]=1,
       Col1, Col2, ...
FROM dbo.TableA
UNION ALL
SELECT [Source]=2,
       Col1, Col2, ...
FROM dbo.TableB

Query with

SELECT [Source], Col1, Col2, ...
FROM dbo.MyView
WHERE ColX >= 1234
  AND ColX < 5678

rasta_pickles · April 30, 2017, 3:10pm

Thanks for that - won't I end up with duplicate data (Table A is 2012 - 30th March 2017, Table B is 2014 - today)?

I can take everything from Table A but then need to be selective about what I extract from Table B.

I looked at the MERGE function but that doesn't seem to do anything that would help me!

Kristen · April 30, 2017, 3:52pm

OK, that's what you meant, I assumed they were different data sets and you needed them combining.

Answer = YES

You could change UNION ALL to UNION and that will DeDupe anything that exactly matches, but I think that's a bad idea.

Better, assuming that you have the same PKey on BOTH tables (for matching rows), would be to exclude it in the UNION ALL view - which avoids the DeDupe step, and the DeDupe step will only operate on Selected Columns, so might exclude a row that is actually the same (on the columns in the SELECT)

CREATE VIEW dbo.MyView
AS
SELECT [Source]=1,
       Col1, Col2, ...
FROM dbo.TableA
UNION ALL
SELECT [Source]=2,
       B.Col1, B.Col2, ...
FROM dbo.TableB AS B
    LEFT OUTER JOIN dbo.TableA
        ON A.PKeyCol1 = B.PKeyCol1
       AND A.PKeyCol2 = B.PKeyCol2
       AND ...
WHERE A.PKeyCol1 IS NULL

Or: Delete all data from TableA that is after 2014 so there is no overlap. That will operate much more efficiently than anything that has to figure out what needs to be left out.

if the PKey is NOT the same on both tables then you could put a WHERE on the TableA part to only include pre-2014, but that is FAR from ideal (although might be OK if the Clustered Index on TableA happens to be the Date column)

CREATE VIEW dbo.MyView
AS
SELECT [Source]=1,
       Col1, Col2, ...
FROM dbo.TableA
WHERE MyDateColumn < '20140101'
UNION ALL
SELECT [Source]=2,
       Col1, Col2, ...

How did you get into the situation that there is overlapping data?

rasta_pickles · May 5, 2017, 11:08am

A new server was installed and there was a parallel run for several weeks - plus the new server is a virtual box with next to no grunt and so it was decided to limit the rebuild of data every night to only pick up the past three years.