How to make User Defined Function split column to two columns text and numbers?

ahmedbarbary · June 17, 2020, 11:16pm

problem

How to make UserDefinedFunction on sql server 2012 split column text to two columns .

first column include text characters as unit measure only and

second column include numbers only in case of integer or decimal etc..?

create table #temp
(
columnTextNumbers  nvarchar(50)
)
insert into #temp (columnTextNumbers) 
values
('3.5A'),
('5.50kg'),
('35.70kg'),
('9m')

I need to make function split column name columnTextNumbers exist on temp table #temp to
two columns
first column is will be columnTextValues include only units measure .

second column will be columnNumberValues include only numbers if integer or decimal etc ..

so Input will be one value have text and numbers

and output result will be two columns as below :

columnNumberValues     columnTextValues
3.5                                        A
5.50                                      kg
35.70                                    kg
9                                          m

harishgg1 · June 18, 2020, 5:10am

hi

you can do it like this with SQL statement itself ...

if you want function .. you can put SQL statement in function !!!

    SELECT 
        columnTextNumbers
         ,left(columnTextNumbers,PATINDEX('%[A-Z]%',columnTextNumbers)-1 )
         ,right(columnTextNumbers,len(columnTextNumbers)-PATINDEX('%[A-Z]%',columnTextNumbers)+1)
    FROM #temp

JeffModen · June 18, 2020, 7:15pm

Like I posted for this very same question over on SQL Server central, you can DRY the code out a bit. It also has kind of a built in check that you have a place where numeric characters do actually turn into non-numeric characters.

 SELECT  Value = SUBSTRING(columnTextNumbers,1,ca.Posit)
        ,UoM   = SUBSTRING(columnTextNumbers,Posit+1,50)
   FROM #temp
  CROSS APPLY (SELECT PATINDEX('%[0-9.][^.0-9]%',columnTextNumbers)) ca (Posit)
;

yosiasz · June 18, 2020, 7:53pm

PATINDEX got issues with unicode strings out of the box if you do not specify collation.
The pitfalls of parsing! Moving forward not a good solution to parse but rather if you have the power to do so, change the way you populate your data in the first place.

create table #temp
(
columnTextNumbers  nvarchar(50)
)
insert into #temp (columnTextNumbers) 
values
('3.5A'),
('5.50kg'),
('35.70kg'),
('9m'),
(N'5.7курица'),
(N'9.9собака'),
(N'8.22ኪግ')

DECLARE @Pattern nvarchar(50) = N'%[0-9.][^.0-9]%'   

SELECT  Value = SUBSTRING(columnTextNumbers,1,ca.Posit)
        ,UoM   = SUBSTRING(columnTextNumbers,Posit+1,50)
   FROM #temp
  CROSS APPLY (SELECT PATINDEX(@Pattern,
columnTextNumbers COLLATE Latin1_General_BIN2)) ca (Posit);

  drop table #temp

JeffModen · June 19, 2020, 1:50am

Having such problems with unicode aren't likely to happen here because the pattern I used is based entirely on numeric digits. The only time you might see a problem is if you're using a language that uses something like roman numerals to represent digits and that would be a problem even if you didn't have to deal with unicode. I will admit that the binary collation can make things faster with LIKE and PATINDEX but, again, not likely here because the pattern I used consists entirely of numeric digits.

yosiasz · June 19, 2020, 2:08am

Yes the patter you use is entirely on the numeric digits. But PATINDEX can't differentitate alpha from numeric when it is dealing with unicode strings .

JeffModen · June 23, 2020, 12:21am

What language did you use for that example?

yosiasz · June 23, 2020, 12:37am

Amharic https://en.wikipedia.org/wiki/Ethiopic_(Unicode_block)

JeffModen · June 23, 2020, 5:32pm

Is that what it's called in SQL Server?

yosiasz · June 23, 2020, 5:42pm

not sure what you mean @JeffModen

JeffModen · June 23, 2020, 9:57pm

What was the language or collation set to in the example you gave when you showed that the numbers went to the right hand column because PATINDEX failed to distinguish between number digits and non-numeric characters?