Best Practice to JOIN a Date-Specific Table - e.g. Prices

Kristen · September 10, 2015, 12:17pm

I want to create a table with the "current" values. Each time any characteristic / column in that table changes I want to add a new row.

I was planning to have an [EndDate] column which would start with '1999-12-31 00:00:00' and when the row changed I would update the original row, change the EndDate to "Now", and create a new row with the current values.

I also have a [CreateDate] column (which would be created with the exact same value as was set on the old row in [EndDate]

My plan was to join associated tables to this table ON the MAX([EndDate]) > [Table2CreateDate]

But perhaps there is a better way?

This isn't actually for a Prices table, but I think that would be similar? I want to store [sys.indexes] whenever it changes so I can link a historical log of the Index DMVs to whatever was the then-current values in the [sys.indexes] table.

One problem I have is that there is no persistent PKey on [sys.indexes]. If I drop/recreate the table, and then recreate the indexes in a different order, the Key fields in [sys.indexes] will be different, so I don't see there being any point Updating my [sys.indexes] copy (e.g. having a TRIGGER store the previous values in an AUDIT table], but rather I will create a new row whenever anything changes, and mark the old entry as "expired on"

But maybe this is not the best way to do it?

Ifor · September 10, 2015, 12:56pm

I tend to do something like the following which keeps the current rows in a T table and the history in an A table.
(Edited to get rid of typos)

SET ANSI_NULLS, QUOTED_IDENTIFIER ON;
GO
CREATE TABLE MainTableT
(
	FromTime smalldatetime NOT NULL CONSTRAINT DF_MainTableT_FromTime  DEFAULT (CURRENT_TIMESTAMP)
	,ToTime smalldatetime NOT NULL CONSTRAINT DF_MainTableT_ToTime  DEFAULT ('20790606 23:59')
	,MainTableID int NOT NULL CONSTRAINT PK_MainTableT PRIMARY KEY
	-- rest of columns
);
GO
CREATE TABLE MainTableA
(
	FromTime smalldatetime NOT NULL 
	,ToTime smalldatetime NOT NULL CONSTRAINT DF_MainTableA_ToTime  DEFAULT (CURRENT_TIMESTAMP)
	,MainTableID int NOT NULL
	-- rest of columns
	,CONSTRAINT PK_MainTableA PRIMARY KEY(ToTime, MainTableID)
);
GO
CREATE TRIGGER TR_MainTableT
ON MainTableT
AFTER UPDATE, DELETE
AS
BEGIN
	DECLARE @Now smalldatetime = CURRENT_TIMESTAMP;

	INSERT INTO MainTableA
	(
		FromTime,ToTime,MainTableID --rest of columns
	)
	SELECT FromTime,@Now,MainTableID --rest of columns
	FROM deleted;

	UPDATE T
	SET FromTime = @Now
	FROM MainTableT T
		JOIN inserted I
			ON T.MainTableID = I.MainTableID;
END;
GO
CREATE VIEW MainTable
AS
SELECT MainTableID -- rest of columns
FROM MainTableT;
GO
CREATE VIEW MainTableHist
AS
SELECT *
FROM MainTableT
UNION ALL
SELECT *
FROM MainTableA;
GO

Ifor · September 10, 2015, 1:32pm

For sys.indexes you could create an IndexShadowT table with a PK of object_id, name.
A regular process (once a day?) could then do inserts, updates and deletes on IndexShadow depending on current values in sys.indexes.

Kristen · September 10, 2015, 2:26pm

Yes, I do something similar using a Trigger when I have a History table, but in that situation I have "direct replacement" rows with a PKey that doesn't change (so all imported rows will always either be new, or update existing rows), but in this case there is every chance that the same row will re-appear with a different PKey ID, or an existing PKey ID will be for a different index So rather than relying on a Trigger I was planning to do:

-- Change MyEndDate from '19991231' to "now" on any rows that have changed
UPDATE U
SET MyEndDate = @dtNow  -- Mark as "closed"
FROM MyTable AS T
    JOIN #ImportTable AS I
       ON (T.ID = I.ID OR T.Name = I.Name)
      AND (
              T.Col1 <> I.Col1
           OR T.Col2 <> I.Col2
           ... etc ...
      )
WHERE T.MyEndDate = '19991231'

-- Insert rows if the PKey does not exist for a row with MyEndDate = '19991231'
INSERT INTO MyTable
SELECT *
FROM #ImportTable AS I
WHERE NOT EXISTS
(
    SELECT * FROM MyTable AS T
    WHERE T.ID = I.ID AND T.Name = I.Name ...
          AND T.MyEndDate = '19991231'
)

Kristen · September 10, 2015, 3:47pm

That's my plan. Its just a question of what is the best way to store the data.

Right now I'm thinking

CREATE TABLE IndexHistory
(
      ID int IDENTITY(1, 1)    -- Just to have a Unique ID for housekeeping
    , CreateDate datetime
    , EndDate datetime         -- Default to 1999-12-31
-- Key columns which must all match
    , object_id int
    , objectname sysname
    , schema_id int
    , SchemaName sysname
    , index_id int
    , IndexName
-- Data columns
    , Col1 ...
    , Col2 ...
)

There can be only one row for a given combination of EndDate='19991231', object_id, index_id. There might be any number of rows with earlier EndDate

I am guessing that I can find the "then current" IndexHistory record with

SELECT	DMCol1, DMCol2, IHCol3, IHCol4, ...
FROM	DMVPerformanceHistory AS DMVH
	OUTER APPLY
	(
		SELECT TOP 1 IHCol3, IHCol4, ...
		FROM	IndexHistory AS IH
		WHERE	    IH.object_id = DMVH.object_id
			AND IH.index_id = DMVH.index_id
			AND IH.CreateDate <= DMV.CreateDate
			AND IH.EndDate >= DMV.CreateDate
		ORDER BY IH.CreateDate
	) AS IH

Note sure if I need TOP 1 in the OUTER APPLY as I think?? that it should only be possible to have one matching row as there should be no overlapping Date Ranges for a given object_id, index_id combination

Ifor · September 10, 2015, 6:22pm

Seems a reasonable plan but it is beyond what I have tried.
I mainly use shadow tables, and their achives, to keep track of information in third party applications for reporting purposes.
Any time I have done this the PK has always been immutable.

ScottPletcher · September 10, 2015, 10:04pm

The index_id can change on any indexOther than the clustered index.

If you want to track indexes, you should create your own immutable id based on the index key columns (I allow include columns to change w/o considering that a "new" index per se, but you might want to distinguish for any column change, even if "just" an included column). That is, your own "index master" table where you looked up the unchanging identity id based on key columns (and included columns if you prefer).

Btw, I think the ORDER BY should have DESC on it to get the latest row:
ORDER BY IH.CreateDate DESC

Kristen · September 11, 2015, 7:06am

Good point, thanks ... maybe table changing between Heap / Clustered should be regarded as a change to the "same" index history record?

I thought about this but decided it was easier to just record the change as a new row and then handle anything else in the reporting. As I see it I can have:

Create a new index (new record)

Rename an index (match with original record - IndexID will stay the same)
Re-create all indexes in a different order (try to match with original records based on IndexName)
Re-create index in a different order and with a different name (I think that will be difficult to match with original record! Column names will be the same though ...)
Change an index - e.g. a different column (match with original record if IndexName is the same), or a change like Fill Factor (should be easier to match with original record )

However, you have given me an idea, thanks:

Add a GUID to my sys.indexes copy-table, and set that to match earlier records as best as I can programmatically - e.g. where there has been a rename/simple-change. Where the change is too subtle [i.e. a new GUID allocated] I can manually set that GUID to match the earlier record so that the reporting works properly.

ScottPletcher · September 11, 2015, 7:34pm

Not me, I avoid guids unless absolutely required since they are such a pain to use. Use a bigint if int may not be large enough (that's a LOT of indexes!).

Kristen · September 12, 2015, 4:29am

Yes, you are right, any unique identifier would do - I was saying GUID but thinking what you said