SQLTeam.com | Weblogs | Forums

Email address standardization


#1

Hi,
input column

tiggergrl1279@aol.col
kite20002@yahoo.comn
irueeuwruiwoe.@ieiweur.com
dtraatz@iastate.eidu
bwxdme@gmail.coim
cygreg1@iastate.edy
ttntngo@iastate.du

output :
tiggergrl1279@aol.com
kite20002@yahoo.com
irueeuwruiwoe.@ieiweur.com
dtraatz@iastate.eid
bwxdme@gmail.com
cygreg1@iastate.edu
ttntngo@iastate.edu

Replace domain extension in sql server

help me solve the task


#2

You could have a list of known typing mistakes - e.g. ".col" -> ".com", ".@" -> "@" etc. and just make those REPLACE statements.

We have Email validation at the point of data entry, so stuff like this would be prevented at that point and at THAT point the human operator can review what the data should have been - or ask for the "owner" to clarify what it should be. That doesn't help when you buy-in / acquire a "dirty list" of data, unless you can report on "bad data" and get the supplier to fix the raw data - that would be my first approach.

I certainly wouldn't guarantee that ".col" was supposed to be ".com", it might have been ".co.el", although I agree that "aol.col" is highly likely to be "aol.com" - so REPLACE could be restricted to that, more certain, combination, and xxx.@aol.com might indicate NOT that the point is not needed but that some additional part is missing, such as xxx.yyy@aol.com ...


#3

For me requirement is to change .col to com
I want change csherman@us.ibm.con to csherman@us.ibm.com but it is changing to csherman@us.com

I have used below query

with mail as(
select email, SUBSTRING(SUBSTRING(email,CHARINDEX('@',email)+1,LEN(email)),CHARINDEX('.',SUBSTRING(email,CHARINDEX('@',email)+1,LEN(email)))+1,LEN(SUBSTRING(email,CHARINDEX('@',email)+1,LEN(email)))) as Extent from vaild

)

select Extent=(CASE
WHEN Extent in ('coim','comn','.com','col', 'cpm','ocm','oom') THEN replace(email,extent, 'com')
WHEN Extent like '%.cpm' THEN replace(email,extent, 'com')
WHEN Extent like '%com' THEN replace(email,extent, 'com')
WHEN Extent like '%com%' THEN replace(email,extent, 'com')
WHEN Extent like 'com%' THEN replace(email,extent, 'com')
WHEN Extent like 'c_om%' THEN replace(email,extent, 'com')
WHEN Extent like '%c_om%' THEN replace(email,extent, 'com')
WHEN Extent like 'c_m%' THEN replace(email,extent, 'com')
WHEN Extent like 'c_m%' THEN replace(email,extent, 'com')
WHEN Extent like 'cm%' THEN replace(email,extent, 'com')
WHEN Extent like 'co_%' THEN replace(email,extent, 'com')
WHEN Extent like '%co_%' THEN replace(email,extent, 'com')
WHEN Extent like '_om%' THEN replace(email,extent, 'com')
WHEN Extent like '%om%' THEN replace(email,extent, 'com')
else
Extent
end) from mail


#4

That will replace BOTH the "col" with "com" in

foobar@acmeCOLony.COL

you need more targeted replace than that.

REPLACE would be all right to change ".@" to "@"

Also to change "@gmSil" to "@gmail"

For multipart replace you have two choices:

If you only want to fix the TLD then you could find the offset of the TLD (e.g. the start of ".col") and then STUFF that with the replacement ".COM"), don't use REPLACE()

Or you could split the Email address into:

Before @
subdomain
domain
TLD - e.g. ".com" or ".co.uk"

then you can have an Old/New substitution table of all the changes you want to make - e.g. [Old]='col', [New]='com' and just do an OUTER JOIN to those substitution tables and change anything where a match is found

Then reassembled the email address from its parts

WITH mail AS
(
    SELECT [Before] = ... everything before "@" ...
           , [SubDomain] = ... everything up to Domain ...
           , [Domain] = ... everything up to TLD ...
           , [TLD] = ... the TLD ...
)
SELECT [Before] + '@' 
       + COALESCE(SD.NewSubDomain, SubDomain) + '.'
       + COALESCE(D.NewDomain, Domain) + '.'
       + COALESCE(TLD.NewTLD, TLD
FROM mail AS M
     LEFT OUTER JOIN dbo.ReplaceSubDomain AS SD
          ON SD.OldSubDomain = M.SubDomain
     LEFT OUTER JOIN dbo.ReplaceDomain AS D
          ON D.OldDomain = M.Domain
     LEFT OUTER JOIN dbo.ReplaceTLD AS TLD
          ON D.OldTLD = M.TLD