In the following SELECT statement I want the READCODE to be case sensitive.
SELECT DISTINCT Snomed,SnomedDescription,
case WHEN left(Emiscode,7) = 'EMISATT'
THEN 'EMISATT'
WHEN left(Emiscode,5)= 'PCSDT'
THEN 'PCSDT'
ELSE Emiscode
END AS Emiscode,
CASE WHEN left(Readcode,5)= 'PCSDT'
THEN 'PCSDT'
WHEN left(Readcode,7)= 'EMISATT'
THEN 'EMISATT'
ELSE Readcode
END AS Readcode, Term
FROM dbo.ALLMedicalCodes with (tablock)
Therefore, for data as;
Snomed SnomedDescription Emiscode Readcode Term
**123 Test 1 AbC Started**
**123 Test 1 ABC Started**
565 Exercise 5 XY Complete
7263 Test 8 HEY In Progress
SQL Server in the above SELECT statement sees the first 2 records as Distinct and brings out;
Snomed SnomedDescription Emiscode Readcode Term
123 Test 1 AbC Started
565 Exercise 5 XY Complete
7263 Test 8 HEY In Progress
I want it to to OUTPUT instead;
**Snomed SnomedDescription Emiscode Readcode Term**
123 Test 1 AbC Started
123 Test 1 ABC Started
565 Exercise 5 XY Complete
7263 Test 8 HEY In Progress
Did you mean SQL Server sees them as NOT being distinct, so returns only one row, but you really do want both rows because of the differing case for the Readcode column?
Pass. It will treat A <> a and A-Accent <> A and <> a-Accent
But I have no idea if it will associated two different characters as being the same. Perhaps a Dash and an Em-Dash, or some subtle distinction like that.
Whereas by defintion a BINARY Collation will treat all characters as only matching if they are the same char-value
Please see below .. this should tell you how to do it
/*
drop table #abc
create table #abc
(
col varchar(100) null
)
insert into #abc select 'ABc'
insert into #abc select 'ABC'
*/
select distinct col from #abc
col
ABc
select distinct col COLLATE Latin1_General_CS_AS from #abc
col
Doesn't @Kristen's suggestion of forcing the collation work? Seems to work for me - see below . My default collation is case insensitive.
CREATE TABLE #A(col1 VARCHAR(32), col2 VARCHAR(32));
INSERT INTO #A VALUES ('a','B'),('a','b');
SELECT DISTINCT col1, col2 FROM #A;
SELECT DISTINCT col1, col2 COLLATE SQL_Latin1_General_CP1_CS_AS FROM #A;
SELECT DISTINCT
col1,
CASE WHEN col1 = 'x' THEN 'XYZ'
ELSE col2
END COLLATE SQL_Latin1_General_CP1_CS_AS AS ReadCode
FROM #A
DROP TABLE #A;
SELECT DISTINCT Snomed,SnomedDescription,
case WHEN left(Emiscode,7) = 'EMISATT'
THEN 'EMISATT'
WHEN left(Emiscode,5)= 'PCSDT'
THEN 'PCSDT'
ELSE Emiscode
END AS Emiscode,
CASE WHEN left(Readcode,5)= 'PCSDT'
THEN 'PCSDT'
WHEN left(Readcode,7)= 'EMISATT'
THEN 'EMISATT'
ELSE Readcode
END COLLATE Latin1_General_BIN2 AS Readcode, Term
FROM dbo.ALLMedicalCodes with (tablock)
N/B: Readcode will ONLY be numbers and characters
Thanks
I would think so - assuming that you wanted to retain case sensitivity only for that one column in the select list. If you do want other columns to have case sensitivity, you should force collation on each of those as well.
And then there is a third case ... applying collation in BOTH places.
If you ONLY have collation on the result of the CASE then the case will have already changed anything begining with "PCSDT" (i.e. case Insensitive) to "PCSDT" (Upper Case), so the distinct will only list the upper case variant.
If you have COLLATE in both places then you see all case variants
SELECT [T_ID] = IDENTITY(int, 1, 1),
*
INTO #TEMP
FROM
(
SELECT [Readcode] = 'PCSDT'
UNION ALL SELECT 'pCSDT'
UNION ALL SELECT 'pcsdt'
UNION ALL SELECT 'Xcsdt'
UNION ALL SELECT 'PCSDTabcd'
UNION ALL SELECT 'pCSDTabcd'
) AS X
-- Show the sample data:
SELECT *
FROM #TEMP
-- Test 1
SELECT T_ID,
Readcode,
CASE WHEN left(Readcode,5)= 'PCSDT'
THEN 'PCSDT'
ELSE Readcode
END COLLATE Latin1_General_BIN2 AS Readcode1,
CASE WHEN left(Readcode,5)= 'PCSDT' COLLATE Latin1_General_BIN2
THEN 'PCSDT'
ELSE Readcode
END AS Readcode2
FROM #TEMP
-- Test 2
SELECT DISTINCT
-- SELECT T_ID,
CASE WHEN left(Readcode,5)= 'PCSDT'
THEN 'PCSDT'
ELSE Readcode
END COLLATE Latin1_General_BIN2 AS Readcode1
FROM #TEMP
--
SELECT DISTINCT
-- SELECT T_ID,
CASE WHEN left(Readcode,5)= 'PCSDT' COLLATE Latin1_General_BIN2
THEN 'PCSDT'
ELSE Readcode
END AS Readcode2
FROM #TEMP
--
SELECT DISTINCT
-- SELECT T_ID,
CASE WHEN left(Readcode,5)= 'PCSDT' COLLATE Latin1_General_BIN2
THEN 'PCSDT'
ELSE Readcode
END COLLATE Latin1_General_BIN2 AS Readcode3
FROM #TEMP
You guys seem to be happy with a collation like SQL_Latin1_General_CP1_CS_AS
I see two problems with that.
If the comparison only needs to worry about Upper / Lower case (including foreign accented characters) I imagine it is fine. I don't know if comparison using a CS + AS collation is UNIQUE to all characters, so my preference has always been to use a BINARY collation when I am comparing two values and want to know if they EXACTLY match (not just in Alpha characters).
If someone knows if a CS + AS collation is indeed unique to all characters that would be good to know (I tried a Google and didn't find the answer)
If you hardcode, say, SQL_Latin1_General_CP1_CS_AS and that gets deployed on a database with a different default collation I think there is a risk that foreign characters (i.e. natively stored in a different collation) get modified according to the SQL_Latin1_General_CP1_CS_AS collation. There are probably side effects in doing that? and perhaps it would also prevent a JOIN / WHERE clause being SARGable. My thought is that using a BINARY Collation avoids both those issues, but I have no idea if that is true either! and would appreciate learning the answer if someone knows it.