SQLTeam.com | Weblogs | Forums

How to make User Defined Function split column to two columns text and numbers?

problem

How to make UserDefinedFunction on sql server 2012 split column text to two columns .

first column include text characters as unit measure only and

second column include numbers only in case of integer or decimal etc..?

create table #temp
(
columnTextNumbers  nvarchar(50)
)
insert into #temp (columnTextNumbers) 
values
('3.5A'),
('5.50kg'),
('35.70kg'),
('9m')

I need to make function split column name columnTextNumbers exist on temp table #temp to
two columns
first column is will be columnTextValues include only units measure .

second column will be columnNumberValues include only numbers if integer or decimal etc ..

so Input will be one value have text and numbers

and output result will be two columns as below :

columnNumberValues     columnTextValues
3.5                                        A
5.50                                      kg
35.70                                    kg
9                                          m
1 Like

hi

you can do it like this with SQL statement itself ...

if you want function .. you can put SQL statement in function !!!

    SELECT 
        columnTextNumbers
         ,left(columnTextNumbers,PATINDEX('%[A-Z]%',columnTextNumbers)-1 )
         ,right(columnTextNumbers,len(columnTextNumbers)-PATINDEX('%[A-Z]%',columnTextNumbers)+1)
    FROM #temp  

image

Like I posted for this very same question over on SQL Server central, you can DRY the code out a bit. It also has kind of a built in check that you have a place where numeric characters do actually turn into non-numeric characters.

 SELECT  Value = SUBSTRING(columnTextNumbers,1,ca.Posit)
        ,UoM   = SUBSTRING(columnTextNumbers,Posit+1,50)
   FROM #temp
  CROSS APPLY (SELECT PATINDEX('%[0-9.][^.0-9]%',columnTextNumbers)) ca (Posit)
;

PATINDEX got issues with unicode strings out of the box if you do not specify collation.
The pitfalls of parsing! Moving forward not a good solution to parse but rather if you have the power to do so, change the way you populate your data in the first place.

create table #temp
(
columnTextNumbers  nvarchar(50)
)
insert into #temp (columnTextNumbers) 
values
('3.5A'),
('5.50kg'),
('35.70kg'),
('9m'),
(N'5.7курица'),
(N'9.9собака'),
(N'8.22ኪግ')

DECLARE @Pattern nvarchar(50) = N'%[0-9.][^.0-9]%'   

SELECT  Value = SUBSTRING(columnTextNumbers,1,ca.Posit)
        ,UoM   = SUBSTRING(columnTextNumbers,Posit+1,50)
   FROM #temp
  CROSS APPLY (SELECT PATINDEX(@Pattern,
columnTextNumbers COLLATE Latin1_General_BIN2)) ca (Posit);

  drop table #temp

Having such problems with unicode aren't likely to happen here because the pattern I used is based entirely on numeric digits. The only time you might see a problem is if you're using a language that uses something like roman numerals to represent digits and that would be a problem even if you didn't have to deal with unicode. I will admit that the binary collation can make things faster with LIKE and PATINDEX but, again, not likely here because the pattern I used consists entirely of numeric digits.

Yes the patter you use is entirely on the numeric digits. But PATINDEX can't differentitate alpha from numeric when it is dealing with unicode strings .

image

What language did you use for that example?

Amharic https://en.wikipedia.org/wiki/Ethiopic_(Unicode_block)

Is that what it's called in SQL Server?

not sure what you mean @JeffModen

What was the language or collation set to in the example you gave when you showed that the numbers went to the right hand column because PATINDEX failed to distinguish between number digits and non-numeric characters?