SQLTeam.com | Weblogs | Forums

Get Email Address from String

sql2012

#1

I'm trying to get just the email address from a varchar field in our ERP system, but having problems as there are several scenarios:

  1. Field could be blank
  2. Field could have email in correct format - 'john.doe@abcd.com'
  3. Field could have email as 'John Doe john.doe@abcd.com' (enclosed by < and >)
  4. Field could have other characters only '.' or ',' etc.

Is there a function that I could use to get just the email address from the string if present, and return nothing if the field is blank or has other characters / text in it.

I've tried using SUBSTRING but can't get it to return what I want.

Thanks
Martyn


#2

Pity the ERP systems allows "any old junk" to be stored in EMail field.

You'll need to write a filter that disallows all the junk - and probably to "improve" that filter over time as you find more edge cases.

Whilst you could allow "Field could have email as 'John Doe john.doe@abcd.com' (enclosed by < and >)" and parse it as appropriate, another approach would be to treat that as an error (i.e. only allow "completely clean email addresses" though) and send back all the failures to ERP data department and get them to fix them. They are going to have to fix the "properly broken ones" anyway, so might be better to make everything properly consistent. Then they next poor blighter that comes along is not going to have to, also, write a complex filter just to be able to use the data.


#3

Kristen

I agree with you totally. Fix the data at source, rather than using convoluted code to try and catch/clean every possible issue. In our ERP system the fields are not mandatory nor do they have any validation/masks in place. I am raising it again with the software provider (sigh!) but ultimately I think we are going to have to go through a data cleansing exercise....

Martyn


#4

that is one thing that irks me on this site. with the desire to help others we fail to ask "Why is it the way it is in the first place?" We get question of people trying to manipulate all kinds of weird data. The pertinent question is what Kristen is saying, and that is what we need to always ask "Why is the data in that condition?" instead of offering some fancy sql statement that does some fancy string manipulation that tickle the brain. yes that is cool, and I wont lie it is fun to do but its also best to elevate people's fundamental design skills? Maybe I have not had my coffee yet...


#5

But to clean up the existing data, won't you still need the requested code? Nothing wrong with "fancy SQL" when it's required for proper data clean up :-).


#6

indeed. but with the "fancy SQL" I thought we should also provide some edumacation :slight_smile:


#7

I'll give you a "maybe" on that :slight_smile:

My best-guess is that some direct-SQL-action on the ERP database is not going to be popular!


#8

But putting in place very strict restrictions on what goes into what is currently a free-form column will be popular?? I think that will be a very hard sell. I would create a separate column and put the email address in it at time of INSERT/UPDATE so that the column doesn't have to be parsed again later.

But the extract the email address code could also be run just as a SELECT, without modifying the original table column,


#9

It Depends :slight_smile:

If the purpose of an EMail column is to be able to supply it to a Bulk Mail process, then "yes" I think it reasonable that the address should satisfy that requirement (AND be validated, within reason - i.e. "Looks OK" rather than "Is deliverable").

We built a data import from a very well known 3rd party Accounts / ERP system that a client is using. Nothing of any value seems to be validated:

Country Code - any old rubbish, typically the last line of the address winds up in this. We reject the whole record on import on the grounds that WE do actually REQUIRE a country code (for geographic analysis)

Email - client has multiple email addresses in the column (";" separated). I suppose that's OK, and maybe a direct feed to a Bulk EMail program would treat those as a TO: or CC: block ...

... but when we question the Client they say that it would be a disaster to email all those addresses because they are "anyone we have ever known", so could be sending Trade Price (or somesuch) details to someone who would be horrified to know how cheaply products were being bought ... so in our APP we split the Email addresses, put then in a "Contacts" sub table, and the Client has to then mark them up as to what-type-of-contact-that-actually-is. Would be much better if they did that in the ERP APP ... by the time it gets to us there is not much appetite to clean them up, so result is that Sales say that lots of Customers complain that they don't get emails about the special offers, and consequently the Company is losing business

VAT (our Sales Tax) numbers. We have an arrangement for reciprocal charging, and claiming-back, across the EU countries. All EU countries have VAT numbers that conform to a checksum validation. Yup, you guessed it, the ERP doesn't bother to perform any checks at all.

Max address line length? Anything you like ... doesn't matter that it won't even begin to fit on an envelop or an Address Label. So we have to reject those too, and Accounts have to tidy up the address each time they just blindly Cut & Paste them into the ERP.

Soap Box? Mine definitely isn't tall enough, or big enough for my requirements :yak:


#10

If you were to provide a CREATE TABLE statement and an INSERT/SELECT to correctly populate the table with at least one of all the different examples that may be contained in the "field" (column) so that I have a readily consumable sample of code to work against, I believe I can show you a relatively simple and performant method to do this.


#11

+1. Create Table + Sample Data Inserts will no doubt bring out the competitive nature of folk here :slight_smile: to provide code that filters the anomalies.


#12

Normally, I'd take the time to build some myself but, especially lately, I just don't have the time especially if I get the test data wrong.