Alphabetic sequence

Kristen · July 3, 2017, 9:36am

Did a Google but didn't find quite what I wanted so figured it was easier to ask for your advice.

I have a column with Sequence number 1...n and I want to convert that into A, B, C, ..., AA, AB, ..., ZZ

I came up with this, which works, but I imagine that there is a better way?

SELECT TOP 750 MyNumber
	, (MyNumber-1)/26, (MyNumber-1)%26
	, CASE WHEN MyNumber < 27 THEN CHAR(64+MyNumber) 
		WHEN MyNumber < 703 THEN CHAR(65+((MyNumber-1)/26)-1) + CHAR(65+((MyNumber-1)%26)) 
		ELSE NULL
		END
FROM	dbo.MyTallyTable

Kristen · July 3, 2017, 9:44am

Actually: I need BLANK for the first one, then 2=A, 3=B, ...,

ScottPletcher · July 3, 2017, 4:45pm

Create a conversion table with rows for each value to be converted, something like:
0, ' '
1, 'A'
2, 'B'
etc.

Kristen · July 3, 2017, 4:53pm

Thanks Scott. I had a similar thought that it might be worth adding a column to my TALLY table (mapping the numeric sequence to the Alpha one I need) - then I can just JOIN that, rather than having to "calculate" it each time.

ScottPletcher · July 3, 2017, 5:13pm

Yeah. For a limited number of entries, such as just ' ' to 'ZZ', you could use a SUBSTRING method. I will often use that when it's necessary, or desirable, to avoid I/O.

DECLARE @number_conversion_chars varchar(8000)
SET @number_conversion_chars = '  A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAABACADAEAFAG...'

SELECT t.tally, SUBSTRING(@number_conversion_chars, t.tally * 2 + 1, 2) AS conversion_chars
FROM dbo.tally t
WHERE t.tally <= 30
ORDER BY t.tally

Kristen · July 3, 2017, 7:27pm

I'm getting old ... good job I post questions here so folk here can remind me of things I should have remembered by myself! A "Sprite String" would no doubt be the most efficient zero-I/O solution here.

703 (I think ...) elements gets me to "ZZ", so that would only be a varchar(1406), which is not going to strain anything

JeffModen · July 6, 2017, 1:00am

Unless you limit to just two characters or are very careful in how you construct such alpha sequences, you can easily spell out a wealth of swear words or things that might be insulting.

A perfect example is when I made plane reservations to fly back East to see my Mom. The confirmation number was "4EBSOB". Think... Confirmation Number "For East Bound SOB".

That being said, what pattern do you wish to follow and how many characters in total?

Kristen · July 7, 2017, 7:34am

BTDTGOTTS!

We had a "Confirmation of registration" random-number that had the potential for "rich words" ... we changed it to be "letter-digit-letter-digit-letter-digit", removed all vowels (and Y I think) from the letters pool, removed all letters and numbers that were confusable (5 / S, 0 / O, 1 / I) from both pools, and I think the result was "safe"

The sequence I need is A-Z followed by AA-ZZ, and the highest number we currently have is

Highest sequence-number currently is "HG" so i think no danger of getting anywhere near ZZ (this is a tie-break on sub-matters for a single matter, so the existing matter which is up to sub-matter HG would have to have new sub-matters added to get to ZZ or a new "busy" matter would need to arise, so I think very unlikely to arise.

But if we had needed 3 letters I could easily have fallen into the SOB trap !

JeffModen · July 13, 2017, 1:28am

Why would "2" = "A"??? It doesn't match your original code on this thread at all.
{EDIT} Never mind... your previous answer answers this.

Kristen · July 13, 2017, 2:52pm

I had a client once that had occasional "extra records" and they had a Human Reference of X1234GB, X1234US, X1235GB and so on. (These represented official records at a government office)

Then if the US record split into two (separate, but related) records that would be X1234US2 - benefit was that the number told you the total number of such additional records, and the majority of the records had no numeric suffix, so it was a nice visual cue that it was "unusual".

Then they had a situation where the original X1234GB was superseded by a real one, replacing the original - so they called that X1234GB1 - 'coz it wasn't an additional record, as such.

And then the official government body introduced a "Provisional" formality, which kinda just got you a reserved-place-in-the-line ... and that was ... ... X1234GB0 ... replaced by X1234GB when it was done for real, and that was replace by X1234GB1 if a superseding record was created ... all before we ever got to a X1234GB2 being the first, real, associated/duplicate record.

Ho!Hum!. They were just ID = IDENTITY(int, 1, 1) to me

ScottPletcher · July 13, 2017, 3:15pm

That's one of the reasons 1NF was devised. Every column should be a single piece of information.

Kristen · July 13, 2017, 3:28pm

Yeah, as you probably expect there are columns within the record take care of all the 1NF. This is just the column that the Humans reserved for their own personal reference ... they dislike my IDENTITY(int, 1, 1) for some reason ...

Kristen · July 13, 2017, 3:33pm

.. actually, all joking apart, I resent every table where I used a Natural Key as Primary, even when client swore-blind that it would never change. It always has done if not in the Client's tenure then for sure when they sell their company and retire, filthy rich, into the sunset, because for sure the new owner will want to change & consolidate ...

I like an IDENTITY in the row.

The clustered index can be whatever is most appropriate though ...

ScottPletcher · July 13, 2017, 3:59pm

Don't forget, you can present a combined column to the user even though the columns are stored (and processed) separately in the table. You can show the user "X1234GB2" but still store the values internally as 4 separate columns, or however many columns it needs to be.

Kristen · July 13, 2017, 4:36pm

Yes, that's definitely The Right Way. Trivially easy to have a VIEW to provide that, and use the VIEW instead of the actual Table.

In this case all the "human bits" of information are not representative of data within the record (except for the "GB" but in the human-reference they have sub-categories of GB that don't map onto the actual Country Code designations ...) so that leaves me with:

X - the Department. Its not a single letter within the record, so some "X" letters are butchered to represent the department, in order to stick with the single-letter approach ... they'd be better off with "XXX1234" IMHO where "XXX" is the code for the department. Of course I could have a mapping table from "Real XXX" code to "Fake one-char code" ... but I've drawn the line on that ... I suppose I could CACHE the Human Reference IN the record ... but I'm doing that, only thing is I've let the Human be in charge of keeping it "current"

"1234" is their next available number. Supposedly the system prevents DUPs on that part, but there are DUPS ... and someone decided that, in certain circumstances, it was OK to have X1234 and A1234 - where "A" is an Appeal to a ruling ... but, of course, relates to the 1234 record (... but only provided that that didn't happen to be one of the unfortunate ones that got DUP'd by accident ...)

So ... I think that my Humans should use IDENTITY(int, 1, 1) and stop messing about pretending that their system is "acceptable"

When the Humans change their mind and its no longer X1234 but Y1234 do they go back an rename and refile all their Emails, and DOCs? Do they heck ...

ScottPletcher · July 13, 2017, 5:16pm

I have to admit I'm not in favor of automatically adopting an identity for (almost) every table. I've seen systems where literally every table is predefined with identity as the clustering PK before any other design takes place. So frankly it's often a crutch to avoid proper modeling and/or dealing with underlying data issues. Since the identity has no real meaning, you can't check for dup data, etc..

Separate columns makes it trivial to prevent dups, such as combining the number and the X / A code.

If that can change, then you'd need to encode the value, using a tinyint value to represent the code. Although if it's very rare, you could just change all the main table values when it occurs.

Kristen · July 14, 2017, 7:00am

We use IDENTITY because if gives us a single column, unique, value for each row. Where we have generic code of some sort, or we are trying to sort a problem out using Compare Scripts and the like, its a lot easier to guarantee that there is a single column unique numeric value [in the row] than having to have scripts that can cope with multi-column keys etc.

A very high proportion of our code (CRUD and the like) is mechanically generated, so that maybe makes my use-case unconventional.

Other things like ETL, Import/Export and then Merge into remote DB, where mechanically generated scripts are used the one-part key makes the "does it already exist" code easier and uniform, and so on. Nothing to do with our choice for clustered index, you'll be pleased to hear! - although do please take credit for the fact that you have raised it here often and that has made me far more vigilant on Clustered Index column choices than I probably would have been otherwise

Web Pages are referenced by having the ID in the URL. That's fine with "anything" of course, including multi-part keys, but that bloats the URLs and data in URL which contains spaces / non alpha-numeric characters etc. needs special handling and, for example, hyphens in URL causes their use in (plain text) emails to soft-line-break, and so on.

So basically my reasoning is that when I want to reference a specific record the option of using a small, single-part, unique identity value is often better than a conventional key, particularly if the only other unique reference(s) are multiple-part

Agree with all you say, but its the humans that are the problem!

They create X1234 in the database today. They create documents with names like X1234$20170714.DOC, they email them to other people, they put them in the document management system, ...

... and then they change the Reference on the record to Y1234 ...

There is no hope!