I am trying to write code that will look for 9 consecutive identical characters in a field in a table and return NULL when found. For example, if the field named test contains
888888888 or
777777777 or
11111111
then return NULL. Thanks in advance for any help someone can provide
are they just numbers or can they be characters too? AAAAAAAAAA??? If numbers, you can try a tally table.
Create table #t (field1 varchar(20))
insert into #t (field1)
values
(888888888)
,(777777777)
,(11111111)
declare @pMin BIGINT = 0
,@pMax BIGINT = 9;
WITH T1(F) AS
(
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1
),
T2(F) AS
(SELECT 1 FROM T1 A, T1 B),
T3(F) AS
(SELECT 1 FROM T2 A, T2 B),
T4(F) AS
(SELECT 1 FROM T3 A, T3 B),
cteTally(N) AS
(
SELECT TOP ((@pMax-(((ABS(@pMin)+@pMin)/2)))+1)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
+ ((((ABS(@pMin)+@pMin)/2))-1)
FROM T4
)
,cte as (select replicate(N,9) as N from cteTally T)
select field1,
case when field1 = n then Null else field1 end
from #T t
left join cte c
on t.field1 like '%' + c.N + '%'
Mike:
Thanks. They are all numbers. I'm checking for a valid SSN where one situation of an invalid SSN would be all repeated characters, like 999999999. Is there an easier way to check this, like maybe a regex expression or some type of pattern matching perhaps? This code will be part of a stored procedure filter where other irregularities will be tested. Thanks again for your reply and willingness to help.
If they are all numbers - and you know what those will be, then wouldn't something like this work?
Select *
From yourTable
Where ssn In ('000000000', '111111111', '222222222', '333333333', '444444444', '555555555', '666666666', '777777777', '888888888', '999999999')
Or better yet - define a table of invalid SSN's and use [NOT]IN or [NOT]EXISTS
Select *
From yourTable
Where ssn In (Select ssn From dbo.InvalidSSN)
Select *
From yourTable t
Where Exists (Select * From dbo.InvalidSSN i Where i.ssn = t.ssn)
If you just want to null the values - then
Select *
, SSN = iif(i.ssn Is Not Null, Null, y.ssn)
From yourTable y
Left Join dbo.InvalidSSN i On i.ssn = y.ssn
If you really want a code solution - then you could implement the regex CLR functions and use a simple regex match. But that really isn't needed for this.
Jeff:
Thanks. I thought about that solution also. It seemed just a bit mechanical and clunky. But, if it works then it works, right? I was just looking for a more sophisticated simple solution. I'm not discounting your solution by any means and am thankful for your time and idea. Thanks
In this case - I think a more 'sophisticated' solution is just going to cause performance issues and be harder to manage and maintain. If this is just going to be contained to a single procedure - then use a table variable and using IN or EXISTS.
A more intriguing q is why on earth you would ever have unencrypted, static SSNs in ANY data at ANY time?? Wow, that's a security hole from he!!.
Zowie! I totally agree with Scott! Why the hell do companies think that working with plain text SSNs in any way, shape, or form is the right thing to do. Notice that is not following by a question mark. You need to advise the company that what they're doing is wrong. If they claim it's not wrong, ask them to add their SSN and related PII to the database as a sign of good faith. Of course, they won't because they know it's not good.
Unfortunately - good or bad there are requirements that don't really have a good solution. In the health care world the SSN is still collected and used to 'identify' the patient. It is not a key value but is used as part of the patient matching code.
Understood but it never needs to be stored or transmitted in clear text form.
Right - but how to accomplish that in an environment where you need to send data across multiple disparate systems. For example:
Hospital system generates an order -> interface engine sends to lab -> lab system generates result -> interface engine sends result(s) back to hospital system and other clinical systems.
If SSN is encrypted on the hospital system - do we decrypt and send to interface engine? Do we send the encrypted value and interface engine decrypts and sends to lab system - or does the interface engine pass it through and the lab system decrypts? What about the other clinical systems - do they receive the encrypted value and decrypt - or receive the un-encrypted value and encrypt it on their system?
How do we setup those other systems - that are managed and maintained by different vendors - to be able to decrypt?
Do we use a common function available to all systems to encrypt/decrypt fields and how do we get all vendors to implement that common function in their code?
Ideally - this would all be a moot point if we have an Enterprise Master Patient Index or an Enterprise Medical Record Number, and all clinical systems used the eMPI or the eMRN value to identify the patients. But even then...users are lazy and will just create a new patient instead of looking up the correct patient in the eMPI/eMRN system and someone then needs to patient match.
The SSN is used as part of the patient matching - and can help identify a patient where the name is different, or a typo on the DOB or other mismatched data points. But - it cannot be the sole identifier either because we can have the same SSN assigned to multiple patient records.
It isn't that easy...
Public / private key encryption is available that was designed to handle those situations.
At any rate, just be aware that if somehow a/many SSNs do leak for your systems into the open, can you really claim in court that you weren't "negligent" because you did do "due diligence" even though you transmitted SSNs in clear text?
That's a part of what I'm talking about... Someone needs to fix that problem. It's inexcusable, though, for it to be stored in clear text at a given site.
And how would you see that being done? There are systems running Oracle on Unix, Intersystems Cache on Linux, SQL Server on Windows - and others. Each one hosting an application or suite of applications from a different vendor.
It isn't as easy as just saying the data needs to be encrypted.
Public/Private key encryption is a possibility - but you have to build that into each application and each vendor has to support that solution. At this time - none of the vendors for any of these applications support that method of encrypting columns in their database.
I don't disagree that PHI/PII columns should be encrypted - the problem is how to get that done across disparate systems in the environment when that data must be shared across those systems. We don't own the code - the vendors own their code and they need to provide that functionality.
All I can do is work with the systems as they are - not as I wish them to be. I cannot enforce encryption on systems I don't manage or support - nor can I force a vendor to provide an encryption scheme other than what they already provide and have built into their system.
All good points, agreed.
Within SQL Server, specifically, you can use ENCRYPTBYASYMKEY, and that's relatively easy to use and will not effect other users of the data.
I admit that I can't readily tell you the best way to encrypt it in non-SQL-Server environments.
I agree. Best practices are desired but sometimes when you are dealing with vendors and disparate systems, even at times within the same ecosystem, but with different stakeholders, it is not easy to do what you know is best practice and safe
I am, by no means, saying that things like this are easy. Do NOT let that be an excuse for doing things WRONG. Would you care if the SSNs for your entire family where handled in such a fashion? If your answer is that you wouldn't mind or don't see anything wrong with it, please get out of the data business because you're a part of the problem rather than a part of the solution.