Having Trouble Removing Hidden Character in SQL Data

Solomon.Rutzky · February 27, 2020, 6:46am

Hello. There are two main issues here:

You are mixing NVARCHAR data with VARCHAR literals and variables. VARCHAR is an 8-bit datatype that can only hold characters found on the code page associated with the collation being used. For string literals and variables, the collation being used is the database's default collation. Outside of UTF-8, which is the 8-bit Unicode encoding (and is only an option starting with SQL Server 2019), single byte code pages (the majority of them) can only contain, at most, 256 characters. NVARCHAR is Unicode (which contains a potential 1,114,112 characters ; but only about 138k of them have been mapped to actual characters).
What all of that means is: when you store Unicode values in non-UTF8 VARCHAR variables, literals, or even columns, then you lose the original Unicode character IF that character does not exist on the code page being used by the collation of that VARCHAR destination. That is why your ascii(' ') test returned 63. It's not that the original character and ? both share the same numeric value (as that is not possible), but the original character got converted into a "?", which is the default replacement character if a character can't be found in the destination code page.
Since you are dealing with Unicode / NVARCHAR data, you need to do 3 things:
1. Use NVARCHAR for variables, not VARCHAR
2. Prefix string literals with an upper-case "N"
3. Use the UNICODE() function instead of the ASCII() function to get the numeric value of the first character in a string.
The following example should help:
```
SELECT NCHAR(2345), ASCII(NCHAR(2345)), 'ऩ', N'ऩ';
-- ऩ	63	?	ऩ
SELECT ASCII('ऩ'), ASCII(N'ऩ'), UNICODE('ऩ'), UNICODE(N'ऩ');
-- 63	63	63	2345
```
When needing to deal with a special code point / character, use a binary collation (i.e. one ending in _BIN2). You use this in a COLLATE clause added to any predicate where you want to override what the collation would have been (or prevent an error if there was no initial collation). For example:
```
DECLARE @Test NVARCHAR(500) = 
   N'aaa bbb' + NCHAR(0xFFFE) + N'ccc';

SELECT @Test,
       REPLACE(@Test, NCHAR(0xFFFE), N'!'),
       REPLACE(@Test COLLATE Latin1_General_100_BIN2,
               NCHAR(0xFFFE), N'!');
-- aaa bbbccc		aaa bbbccc		aaa bbb!ccc
```

For more info on working with collations (especially in SQL Server), please visit:
Collations Info