PKey column has characters unsuitable for URL

Kristen · December 12, 2017, 8:53am

We have an IDENTITY column in every table (usually the PKey, rarely the Clustered Index). It means that we have a "safe" single-column unique value to use in URLs and the like (the "like" includes IDs for HTML elements too, such as drill-down <DIV> that are replaced by AJAX).

I have a report which (now) needs to run on data from Production. We have warehouse of that data, but it is impractical to refresh it - the report is to find exceptions (in Production table), they are fixed, then User wants to re-run the report to look for further issues, clearly waiting for an import to Warehouse is not practical.

Production is a 3rd party DB, Warehouse is an in-house DB. I can make reasonable changes to the 3rd Party DB, but we try to avoid doing that (in case complications with normal operation and/or during upgrades)

Also, the performance of the report if run on Production tables is terrible. I haven't looked into that, but probably we already have more suitable indexes in Warehouse - that's fair enough, the two DBs are doing different jobs.

So I've been doing:

SELECT MyID, PKCol, Col2, Col3, ...
FROM WAREHOUSE.dbo.MyTable
WHERE ...

and I can change that to:

SELECT PKCol, Col2, Col3, ...
       , [PKCol_ENCODED] = dbo.MyEncodeFunction(PKCol)
FROM PRODUCTION.dbo.MyTable
WHERE ...

and then using PKCol_ENCODED anywhere that can't handle embedded characters like SPACE or "",
"&" (yes, I know, but they HAVE been putting things like that in the PKCol, and the APP does nothing to prevent that) and they drive both the URL and the <DIV ID="XXX\YYY"> mad ...

And now I find that my Cunning Plan! to replace any non alphanumeric with "_" is causing duplicates ... XXX\YYY and XXX&YYY both map to the same thing ...

My first thought is to create a translation table for the PKeys

CREATE TABLE PKeyTranslation
(
    [MyID] int IDENTITY(1,1) NOT NULL
  , [MyPKey] varchar(10) NOT NULL
)

and then to freshen this up on every run of the report (which will find no, or very few, new rows each time)

But its a right palaver to do (adding to lots of different reports), and maybe the reports are going to be too slow anyway (because querying PRODUCTION with inappropriate structure/indexes etc)

In fairness the ID is only required to provide a unique ID (for URL and <DIV ID>) and if the user Drills Down and I get the ID as a @Parameter to my SProc that will only be a single-row decode lookup in the PKeyTranslation table

I suppose I could use a single PKeyTranslation table regardless of how many physical tables there are (provided they all have a one-part varchar column for their PKey), as I don't really care about the mapping, I just need a unique, non-changing, ID ...

... but I'm more inclined to match the PKey to the actual WAREHOUSE table's ID so that whether it is displaying WAREHOUSE data or PRODUCTION data the ID will always represent the same row.

That will mean (presumably?) changing the IDENTITY in the Warehouse Table to INT and putting the IDENTITY in the "translation table" [one for each actual table] instead.

Grrrr ... I hate 3rd Party Databases !!

Ding! Possibly eureka moment. Add a TRIGGER to PRODUCTION so that Translation Table in WAREHOUSE is maintained in real time?

gbritton · December 12, 2017, 2:29pm

Wow Kristen, have you been reading James Joyce? Talk about yer stream of consciousness....

Anyway, A trigger could do it, I suppose. That work would likely be overshadowed by the rest of the work on the server.

Kristen · December 12, 2017, 3:00pm

I'm a detail-person, eh?!!

So you'd be happy with a local "translation table" converting PRODUCTION multi-part PKey to single-part INT and then in my query I can do:

SELECT M.Col1, M.Col2, ...
     , T.MyID
FROM PRODUCTION.dbo.MainTable AS M
    JOIN dbo.MyTranslation AS T
         ON T.PKey1 = M.PKey1
        AND T.PKey2 = M.PKey2
        ...
WHERE M.ColX = @CriteriaX
  AND M.ColY = @CriteriaY

which will give me a "reference ID" for all the rows in the "report", and then when the user presses a Drill Down that query will be:

SELECT C.ColM, C.ColN, ...
     , T.MyID
FROM PRODUCTION.dbo.MainTable AS M
    JOIN dbo.MyTranslation AS T
         ON T.PKey1 = M.PKey1
        AND T.PKey2 = M.PKey2
        ...
    JOIN PRODUCTION.dbo.ChildTable AS C
         ON C.PKey1 = M.FKey1
        AND C.PKey2 = M.FKey2
    ...
WHERE T.MyID = @MyParam

I was thinking that was a significant overhead, but now I've typed it the additional Translation-Table looks very skinny

Addition of new "codes" will be a rare event, I don't think there are any high-volume tables that have this sort of "any old rubbish will do" PKey columns - all the big tables, for Invoices and the like, have Numeric PKey columns

Kristen · December 19, 2017, 6:47am

Got a snag on this ...

I have Name Records coming from (say) two sources.

One uses an INT as a PKey, the other an NVARCHAR(10).

I want a tight correlation between the source table's PKey and mine, so I want to create a Translation table for each source table (and, additional to that, it is also permitted to enter an adhoc name records direct into my table).

My Name table has columns for Source (tinyInt) and External PKey Reference varchar(50), but it was only intended to do a lookup of single rows at a time. For a broader query where I JOIN ExternalRef to remote table's PKey there will often be some conversion going on - could be varchar-to-int, or could be a different collation ... so the performance is dreadful.

So if my translation table is same datatype, collation, everything gets a bit harder. Currently I have IDENTITY on the Names table, so generate a unique identifier. When I thought I was only going to have one Translation table I could just "move" that IDENTITY function to the Translation table ... although the adhoc rows added direct to the Name table would need some workaround ...

But with 2+ Translation tables, one per Source, I now need a SEQUENCE object. Although that also sorts out the IDs for Adhoc records in Names table.

So I think I have:

Get rid of IDENTITY attribute in Names table
Create a SEQUENCE object (basically to replace the IDENTITY function)
Create 1/many Translation tables - this will be for each combination of PKey columns in the source

If I have two sources that both, conveniently, have INT Pkey then one Translation table will do (with TINYINT Source and INT ExternalRef columns)

For one/multi-part VARCHAR PKey tables I probably need one Translation table per source table. Its unlikely that both will have similar width columns, and [I think?] collation needs to identical to source too, to allow JOINS to be SARGable

Its quite a lot of refactoring work really

Only other thought is one translation table, but add additional ExternalRef columns every time I have a Source that has different requirements. Create a Filtered Index on each (i.e. excluding NULL values). But I think a Clustered Index on the ExternalRef column(s) would be much better, and therefore a separate Translation table for each Source.

gbritton · December 19, 2017, 9:57am

can you put a computed column (int) on the table with the nvacrhar(10) key?

Kristen · December 19, 2017, 10:20am

What would be the formula for the computed column? (Sorry if I'm being thick!)

Can I put the NEXT VALUE FOR statement for the SEQUENCE in the computed column? That would be cool!

yosiasz · December 19, 2017, 5:35pm

I have been telling Kristen to write a book!

Kristen · December 20, 2017, 5:59am

Actually, ...I have started a Blog. Not going to "open" it until I have a decent number of articles, I've done a few so far, and they are all controversial ....

gbritton · December 20, 2017, 4:39pm

cast(nvarcholname as int)