SQLTeam.com | Weblogs | Forums

Help with parsing free-form field


#1

Hi,

I really your need with something. Let's say I have a db field called 'plaintiff' with data like this:

'ST OF FL OBO' JOHN D SMITH.

I need to parse out the 'ST OF FL OBO' from the name 'John D Smith'. So far this seems to work:

CASE
WHEN (charindex('OBO',plaintiff,1)) > 0 then replace(cast(plaintiff as varchar(100)), 'ST OF FL OBO', '')

Leaving me with the full name. Of that result I need to separate the name into lastname, firstname, and middlename/initial.

I need this structure but not sure how to do it. Your help is greatly appreciated.

CASE
--get lastname
WHEN (charindex('OBO',plaintiff,1)) > 0 then replace(cast(plaintiff as varchar(100)), 'ST OF FL OBO', '')

CASE
--get firstname
WHEN (charindex('OBO',plaintiff,1)) > 0 then replace(cast(plaintiff as varchar(100)), 'ST OF FL OBO', '')

CASE
--get middlename
WHEN (charindex('OBO',plaintiff,1)) > 0 then replace(cast(plaintiff as varchar(100)), 'ST OF FL OBO', '')


#2

Let me make it easier. How can I extract the firstname, middlename, and lastname out of the expression...

'ST OF FL OBO JOHN D SMITH'

Thank you.


#3

With difficulty. The formatting and styles of names are many and varied, more so if you include international names.

John Smith
John D Smith
John de Smith
Mary Ann Smith
John Smith-Jones
John Smith Jones (both double barrelled forms exist in the UK, at least)

Parsing "John Smith Jones" or even "John de Smith" and "Mary Ann Smith", all three-part names, is difficult.

The best that I reckon you could "easily" do is to tackle the easy ones first.

Chop of any recognsiable Prefix and Suffixes that you can - "Mr", "Mrs" etc. off the front and "Junior", "Jnr", "III" etc. off the end. Have a list of Prefixes and Suffixes so that, as you refine the list of "can't process these" you can add more to that list, and re-run the job - each time getting fewer exceptions left at the end.

All 2-part names, for example, are assumed to be First / Last.

Then deal with 3-part names looking at what the middle name is. If it is a single letter then assume that is a middle initial. If it is "de" "van" etc. then its part of the surname.

Perhaps use a table of recognised First Names to try to separate them off the front. But "George" is both and first and a last name here, so that may not be much help.


#4

Alexander Boris de Pfeffel Johnson!


#5

... or the First Name ... it depends!!