Replace String before a constant

I have a varchar(4000) field of free text. I have to remove identifiable information. Example: Patient arrived from xxxxx correctional facility. I need to remove whatever value xxxxx represents. xxxxx can be between 4 and 20 characters and I do not have a definitive list. Is this even possible? If I can't replace a variable amount of characters, could I at least replace the first 20 characters before correctional?

Thanks in advance for any help you may provide.

Definitely possible. The big q is:
Can there be multiple instances that need screened out in the same column? That is:
'Patient arrived from abc facility. Patient moved to home. Patient readmitted. Patient arrived from def correctional facility.'
where both "abc" and "def" need masked out?

Something like this might get you started:

update a
   set yourfield=substring(yourfield
                          ,1
                          ,charindex(@firstpart,yourfield)-1
                          )
                +substring(yourfield
                          ,charindex(@lastpart
                                    ,yourfield
                                    ,charindex(@firstpart,yourfield)
                                    )
                          +len(@lastpart)
                          ,len(yourfield)
                          -charindex(@lastpart
                                    ,yourfield
                                    ,charindex(@firstpart,yourfield)
                                    )
                          +len(@lastpart)
                          )
  from yourtable as a
 where yourfield like '%'+@firstpart+'%'+@lastpart+'%'
;

As always when updating, do it in test environment and test thoroughly, before even considering production environment.

yes, there can be multiple instances but once I figure out how to do it for one I will modify the sql to accommodate for those.

The main logic is in the CROSS APPLYs, which, in my view, makes it easier to change and keeps it from "cluttering up" the main SELECT.

SELECT free_text,
    CASE WHEN chars_need_removed = 0 THEN free_text ELSE STUFF(free_text, from_start + 5, 
        chars_len_to_remove, REPLICATE('x', chars_len_to_remove)) END AS new_text
FROM ( VALUES('Patient arrived from abcde correctional facility.'),
             ('Patient arrived from abc facility. Patient moved to home. Patient readmitted. Patient arrived from def correctional facility.')
     ) test_data(free_text)
CROSS APPLY (
    SELECT CHARINDEX('from ', free_text) AS from_start
) AS ca1
CROSS APPLY (
    SELECT CASE WHEN from_start = 0 THEN 0
        ELSE CHARINDEX(' ', free_text, from_start + 5) - (from_start + 5) END AS chars_len_to_remove
) AS ca2
CROSS APPLY (
    SELECT CASE WHEN chars_len_to_remove = 0 THEN 0 ELSE CASE 
        WHEN LTRIM(SUBSTRING(free_text, from_start + 5 + chars_len_to_remove,  9)) = 'facility'
          OR LTRIM(SUBSTRING(free_text, from_start + 5 + chars_len_to_remove, 13)) = 'correctional'
            THEN 1 ELSE 0 END END AS chars_need_removed
) AS ca3
1 Like

Thank you. This is helpful but it seems I still have to know the name of every facility and add them to the VALUES statement.

No, not at all. The values just provide sample row data. The code dynamically determines the bytes that represent the facility (the code just determines how long the name is, it doesn't check the name at all).

For example:

CREATE TABLE #your_table_data (
    free_text varchar(8000) NULL
    )
INSERT INTO #your_table_data
--*** CHANGE THIS DATA HOWEVER YOU WANT TO ***--
VALUES('Patient arrived from abcde correctional facility.'),
             ('Patient arrived from abc facility. Patient moved to home. Patient readmitted. Patient arrived from def correctional facility.')


SELECT free_text,
    CASE WHEN chars_need_removed = 0 THEN free_text ELSE STUFF(free_text, from_start + 5, 
        chars_len_to_remove, REPLICATE('x', chars_len_to_remove)) END AS new_text
FROM #your_table_data
CROSS APPLY (
    SELECT CHARINDEX('from ', free_text) AS from_start
) AS ca1
CROSS APPLY (
    SELECT CASE WHEN from_start = 0 THEN 0
        ELSE CHARINDEX(' ', free_text, from_start + 5) - (from_start + 5) END AS chars_len_to_remove
) AS ca2
CROSS APPLY (
    SELECT CASE WHEN chars_len_to_remove = 0 THEN 0 ELSE CASE 
        WHEN LTRIM(SUBSTRING(free_text, from_start + 5 + chars_len_to_remove,  9)) = 'facility'
          OR LTRIM(SUBSTRING(free_text, from_start + 5 + chars_len_to_remove, 13)) = 'correctional'
            THEN 1 ELSE 0 END END AS chars_need_removed
) AS ca3
1 Like

Ah :bulb: Brilliant. Thank you.