Case Sensitive

dr223 · August 13, 2015, 2:52pm

Hi,

In the following SELECT statement I want the READCODE to be case sensitive.

SELECT DISTINCT Snomed,SnomedDescription,
case WHEN left(Emiscode,7) = 'EMISATT' 
THEN 'EMISATT'
WHEN left(Emiscode,5)= 'PCSDT' 
THEN 'PCSDT' 
ELSE Emiscode
END AS Emiscode,
CASE WHEN left(Readcode,5)= 'PCSDT' 
THEN 'PCSDT'
WHEN left(Readcode,7)= 'EMISATT' 
THEN 'EMISATT' 
ELSE Readcode
END AS Readcode, Term
FROM dbo.ALLMedicalCodes with (tablock)

Therefore, for data as;

Snomed    SnomedDescription         Emiscode       Readcode      Term
**123               Test                  1           AbC          Started**
**123               Test                  1           ABC          Started**
565                  Exercise             5           XY           Complete
7263                Test                  8           HEY          In Progress

SQL Server in the above SELECT statement sees the first 2 records as Distinct and brings out;

Snomed    SnomedDescription                  Emiscode      Readcode         Term
123               Test                           1           AbC          Started
565               Exercise                       5           XY           Complete
7263              Test                           8           HEY         In Progress

I want it to to OUTPUT instead;

**Snomed    SnomedDescription          Emiscode     Readcode        Term**
123               Test                    1           AbC          Started
123               Test                    1           ABC          Started
565               Exercise                5           XY           Complete
7263              Test                    8           HEY         In Progress

Thank you in advance

Kristen · August 13, 2015, 3:20pm

Like this perhaps?

CASE WHEN left(Readcode,5)= 'PCSDT'  COLLATE Latin1_General_BIN2

Any binary collation will do (in fact I'm not sure where there is more than one!)

dr223 · August 13, 2015, 3:25pm

What about ...

ELSE Readcode COLLATE SQL_Latin1_General_CP1_CS_AS

dr223 · August 13, 2015, 4:06pm

any help please..

Thanks

JamesK · August 13, 2015, 4:09pm

Did you mean SQL Server sees them as NOT being distinct, so returns only one row, but you really do want both rows because of the differing case for the Readcode column?

dr223 · August 13, 2015, 4:14pm

Yes, JamesK thats all I want to achieve.

Thanks for your time

Kristen · August 13, 2015, 4:15pm

Pass. It will treat A <> a and A-Accent <> A and <> a-Accent

But I have no idea if it will associated two different characters as being the same. Perhaps a Dash and an Em-Dash, or some subtle distinction like that.

Whereas by defintion a BINARY Collation will treat all characters as only matching if they are the same char-value

hhrr · August 13, 2015, 4:21pm

Hi

Please see below .. this should tell you how to do it

/*
drop table #abc
create table #abc
(
col varchar(100) null
)
insert into #abc select 'ABc'
insert into #abc select 'ABC'
*/
select distinct col from #abc
col

ABc

select distinct col COLLATE Latin1_General_CS_AS from #abc
col

ABc
ABC

JamesK · August 13, 2015, 4:22pm

Doesn't @Kristen's suggestion of forcing the collation work? Seems to work for me - see below . My default collation is case insensitive.

CREATE TABLE #A(col1 VARCHAR(32), col2 VARCHAR(32));

INSERT INTO #A VALUES ('a','B'),('a','b');

SELECT DISTINCT col1, col2 FROM #A;
SELECT DISTINCT col1, col2 COLLATE SQL_Latin1_General_CP1_CS_AS FROM #A;
SELECT DISTINCT
	col1,
	CASE WHEN col1 = 'x' THEN 'XYZ'
	ELSE col2
	END COLLATE SQL_Latin1_General_CP1_CS_AS AS  ReadCode
FROM #A

DROP TABLE #A;

...

dr223 · August 13, 2015, 4:36pm

So if I consider @Kristen

Is the below code OK;

SELECT DISTINCT Snomed,SnomedDescription, case WHEN left(Emiscode,7) = 'EMISATT' THEN 'EMISATT' WHEN left(Emiscode,5)= 'PCSDT' THEN 'PCSDT' ELSE Emiscode END AS Emiscode, CASE WHEN left(Readcode,5)= 'PCSDT' THEN 'PCSDT' WHEN left(Readcode,7)= 'EMISATT' THEN 'EMISATT' ELSE Readcode END COLLATE Latin1_General_BIN2 AS Readcode, Term FROM dbo.ALLMedicalCodes with (tablock)

N/B: Readcode will ONLY be numbers and characters
Thanks

JamesK · August 13, 2015, 4:39pm

I would think so - assuming that you wanted to retain case sensitivity only for that one column in the select list. If you do want other columns to have case sensitivity, you should force collation on each of those as well.

dr223 · August 13, 2015, 4:41pm

Sorry where do u insert the collation - as my select statement above is OK

Kristen · August 13, 2015, 4:58pm

I don't know if the collation AFTER the CASE's END will work. That might just convert the result of the CASE to that collation??

I always put it after each comparison, so:

ColA = ColB COLLATE xxx

which would mean:

WHERE ColA = ColB COLLATE xxx
...
CASE WHEN ColA = ColB COLLATE xxx THEN 1
WHEN ColC = ColD COLLATE xxx THEN 2
ELSE 0
END

so perhaps it depends on whether you want the

CASE WHEN left(Readcode,5)= 'PCSDT'

comparison test to be made using a case SENSITIVE test, or whether you want that case INsensitive but you then want the DISTINCT to be case SENSITIVE?

JamesK · August 13, 2015, 5:07pm

If the comparison should be case sensitive, then apply case sensitive collation there as well if they don't already have case sensitive collation

CASE 
	WHEN 
		LEFT(Readcode,5) COLLATE  SQL_Latin1_General_CP1_CS_AS
		= 
		'PCSDT' COLLATE  SQL_Latin1_General_CP1_CS_AS
	THEN 'PCSDT'
...

The collate after the END applies the collation only to the result of the ease expression.

Kristen · August 13, 2015, 5:10pm

And then there is a third case ... applying collation in BOTH places.

If you ONLY have collation on the result of the CASE then the case will have already changed anything begining with "PCSDT" (i.e. case Insensitive) to "PCSDT" (Upper Case), so the distinct will only list the upper case variant.

If you have COLLATE in both places then you see all case variants

SELECT	[T_ID] = IDENTITY(int, 1, 1),
	*
INTO #TEMP
FROM
(
	SELECT [Readcode] = 'PCSDT'
	UNION ALL SELECT 'pCSDT'
	UNION ALL SELECT 'pcsdt'
	UNION ALL SELECT 'Xcsdt'
	UNION ALL SELECT 'PCSDTabcd'
	UNION ALL SELECT 'pCSDTabcd'
) AS X


-- Show the sample data:
SELECT	*
FROM	#TEMP

-- Test 1
SELECT	T_ID,
	Readcode,
	CASE WHEN left(Readcode,5)= 'PCSDT' 
		THEN 'PCSDT'
		ELSE Readcode 
	END COLLATE Latin1_General_BIN2 AS Readcode1,
	CASE WHEN left(Readcode,5)= 'PCSDT'  COLLATE Latin1_General_BIN2
		THEN 'PCSDT'
		ELSE Readcode 
	END AS Readcode2
FROM	#TEMP

-- Test 2
SELECT DISTINCT
-- SELECT T_ID,
	CASE WHEN left(Readcode,5)= 'PCSDT' 
		THEN 'PCSDT'
		ELSE Readcode 
	END COLLATE Latin1_General_BIN2 AS Readcode1
FROM	#TEMP
--
SELECT DISTINCT
-- SELECT T_ID,
	CASE WHEN left(Readcode,5)= 'PCSDT'  COLLATE Latin1_General_BIN2
		THEN 'PCSDT'
		ELSE Readcode 
	END AS Readcode2
FROM	#TEMP
--
SELECT DISTINCT
-- SELECT T_ID,
	CASE WHEN left(Readcode,5)= 'PCSDT'  COLLATE Latin1_General_BIN2
		THEN 'PCSDT'
		ELSE Readcode 
	END COLLATE Latin1_General_BIN2 AS Readcode3
FROM	#TEMP

gives

T_ID        Readcode  
----------- --------- 
1           PCSDT
2           pCSDT
3           pcsdt
4           Xcsdt
5           PCSDTabcd
6           pCSDTabcd

T_ID        Readcode  Readcode1 Readcode2 
----------- --------- --------- --------- 
1           PCSDT     PCSDT     PCSDT
2           pCSDT     PCSDT     pCSDT
3           pcsdt     PCSDT     pcsdt
4           Xcsdt     Xcsdt     Xcsdt
5           PCSDTabcd PCSDT     PCSDT
6           pCSDTabcd PCSDT     pCSDTabcd

Readcode1 
--------- 
PCSDT
Xcsdt

Readcode2 
--------- 
PCSDT
pCSDTabcd
Xcsdt

Readcode3 
--------- 
PCSDT
Xcsdt
pCSDT
pCSDTabcd
pcsdt

Kristen · August 13, 2015, 7:12pm

You guys seem to be happy with a collation like SQL_Latin1_General_CP1_CS_AS

I see two problems with that.

If the comparison only needs to worry about Upper / Lower case (including foreign accented characters) I imagine it is fine. I don't know if comparison using a CS + AS collation is UNIQUE to all characters, so my preference has always been to use a BINARY collation when I am comparing two values and want to know if they EXACTLY match (not just in Alpha characters).

If someone knows if a CS + AS collation is indeed unique to all characters that would be good to know (I tried a Google and didn't find the answer)

If you hardcode, say, SQL_Latin1_General_CP1_CS_AS and that gets deployed on a database with a different default collation I think there is a risk that foreign characters (i.e. natively stored in a different collation) get modified according to the SQL_Latin1_General_CP1_CS_AS collation. There are probably side effects in doing that? and perhaps it would also prevent a JOIN / WHERE clause being SARGable. My thought is that using a BINARY Collation avoids both those issues, but I have no idea if that is true either! and would appreciate learning the answer if someone knows it.

Case Sensitive

/* drop table #abc create table #abc ( col varchar(100) null ) insert into #abc select 'ABc' insert into #abc select 'ABC' */ select distinct col from #abc col

select distinct col COLLATE Latin1_General_CS_AS from #abc col

/*
drop table #abc
create table #abc
(
col varchar(100) null
)
insert into #abc select 'ABc'
insert into #abc select 'ABC'
*/
select distinct col from #abc
col

select distinct col COLLATE Latin1_General_CS_AS from #abc
col