When cleaning up data held in a nVarchar(4000) column, we discovered a specific unwanted character that we can't remove without removing a legitimate character. We call this unwanted character the "long space". it has an ascii value of 63
select ascii(' ') -- returns 63
Unfortunately a "?" has the same ascii value, so any efforts to remove the long space result in removing question marks.
select ascii('?') -- returns 63
We are stumped!! Guidance appreciated!!!
Below are some examples that we have experimented with
You are mixing NVARCHAR data with VARCHAR literals and variables. VARCHAR is an 8-bit datatype that can only hold characters found on the code page associated with the collation being used. For string literals and variables, the collation being used is the database's default collation. Outside of UTF-8, which is the 8-bit Unicode encoding (and is only an option starting with SQL Server 2019), single byte code pages (the majority of them) can only contain, at most, 256 characters. NVARCHAR is Unicode (which contains a potential 1,114,112 characters ; but only about 138k of them have been mapped to actual characters).
What all of that means is: when you store Unicode values in non-UTF8 VARCHAR variables, literals, or even columns, then you lose the original Unicode character IF that character does not exist on the code page being used by the collation of that VARCHAR destination. That is why your ascii(' ') test returned 63. It's not that the original character and? both share the same numeric value (as that is not possible), but the original character got converted into a "?", which is the default replacement character if a character can't be found in the destination code page.
Since you are dealing with Unicode / NVARCHAR data, you need to do 3 things:
Use NVARCHAR for variables, not VARCHAR
Prefix string literals with an upper-case "N"
Use the UNICODE() function instead of the ASCII() function to get the numeric value of the first character in a string.
When needing to deal with a special code point / character, use a binary collation (i.e. one ending in _BIN2). You use this in a COLLATE clause added to any predicate where you want to override what the collation would have been (or prevent an error if there was no initial collation). For example:
You're welcome. To be clear, using a binary collation won't help if you forget to indicate that the data is NVARCHAR, either by declaring the variable to be NVARCHAR or prefixing the string literal with an upper-case "N". For example (and this time I used Code Point U+2007 which is "Figure Space", another type of space character):
Mike, for whatever reason, when the O.P. copied and pasted that code into the editor, it did not keep the original characters, and so changed the actual character that is causing the issue to a regular space. Try running my example from this reply: Having Trouble Removing Hidden Character in SQL Data