SQLTeam.com | Weblogs | Forums

UDF to Sort Column Values into one Concatenated Key Field


#1

Hello -

New to SQL Server. Have a lot of MS Access experience and know how to do this task in Access vba but I have not been able to figure out how to replicate this task in a SQL server UDF.

I have a qry that appends new rows to an existing table. Among the columns there are: col1, col2, col3, col4, and col5. These all have the same data type (Long) AND same source of the value (think person ID). The qry can generate many combinations of these 5 columns but in some cases, it is possible that 25 combinations are actually the same 5 values - just in different columns.

To eliminate duplicate combinations, in vba I have a function that:

  1. puts the 5 col values into an array
  2. sorts the array
  3. then concatenates the ORDERED col values into a created field that will be appended to the Primary Key field on the target table.

So these values:
col1=333, col2=555, col3=111, col4=222, and col5=444 (all in the same row)

(note: another row could have the same values in different columns: col1=222, col2=333, col3=444, col4=555, and col5=111 but would in fact be a duplicate row in the target table - as demonstrated by the resulting value below)

would be sorted and concatenated into:
PrimaryKey field = 111222333444555

Any guidance would be greatly appreciated!!!


#2

refer to

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=81254


#3

If it was me I would constrain new rows to ALWAYS have Col1..Col5 have the UserIDs in Ascending Order, thus no need to check every possible combination to see if there was a DUPe, just arrange the values in the ascending order for the new row and see if that already exists.

if that is not possible (cannot change existing system) then have a VIEW that sorts it out with a persisted index and compare against that when inserting a new row.


#4

Hi Kristen - thank you for the suggestion. As a newbie to SQL programming, I'm not sure I am able to translate what you are suggesting into code :confused: My qry is basically pulling together all potential permutations/combinations. The values on col2 and col3 (for example) could end up in two rows as:
col1 col2
row 1 345 567
row 2 567 345

But in reality, they are the same record. That's just a small example. My actual target table has 9 columns:

col1 - id value from table 1
col2 - id value from table 2
col3 - id value from table 2
col4 - id value from table 3
col5 - id value from table 3
col6 - id value from table 3
col7 - id value from table 4
col8 - id value from table 2, 3, OR 4
col9 - id value from table 5

Even though I have 5 tables, it really is one table (with about 400 unique ids - and other information) that I split into 5 thinking it may improve my qry's performance. So in the end, each row will end up with 9 ids stored in 9 target columns. Not sure how I would re-order the values if they have to land on a specific column on the target table.


#5

That's fairly straightforward, using the "FOR XML" technique. Naturally you can remove the "|" delimiter between values and/or do other formatting as you need to.

SELECT DISTINCT 
    STUFF(CAST((SELECT '|' + CAST(col_value AS varchar(10)) FROM (
        SELECT col1 AS col_value WHERE col1 IS NOT NULL UNION 
        SELECT col2 WHERE col2 IS NOT NULL UNION 
        SELECT col3 WHERE col3 IS NOT NULL UNION 
        SELECT col4 WHERE col4 IS NOT NULL UNION 
        SELECT col5 WHERE col5 IS NOT NULL) 
        AS col_values ORDER BY col_value FOR XML PATH('')) AS varchar(8000)), 1, 1, '') AS concat_values
FROM (
    SELECT 1 AS row_id, 333 AS col1, 555 AS col2, 111 AS col3, 222 AS col4, 444 AS col5 UNION ALL
    SELECT 2, 222, 333, 444, 555, 111 UNION ALL
    SELECT 3, 888, NULL, 888, 15, NULL UNION ALL
    SELECT 4, NULL, NULL, NULL, NULL, NULL UNION ALL
    SELECT 5, 888, 888, 888, 888, 888
) AS test_data

#6

Hi Scott - generally, I understand what you've put together. Where you are losing me is in the Select 2-5 statements. Looks to me like you are hard coding the values? Shouldn't they be parameters passed to the function?

Maybe if I share the qry I am using in MS Access, that may help clarify. Below is the SQL I use in MS Access that would give me the result I'm looking for if it could handle the task. Notice the function "fxMakeKey" is where I create the key - that function is what I'm trying to replicate in SQL.

Here's a high level of the data for one table:

Table: pos1
Column: pos1 - long type, ID
Column: pos1_cost - an integer value representing cost
Column: pos1_rating - an integer value representing ability (high is good)

Think of pos1 table as holding Red items. Pos2a-b hold Blue items. Pos3a-c hold Yellow items. Pos4 holds Green items. Pos5 can hold either blue, yellow, or green items. and finally pos6 holds Black items. Each of the colors has between 40 and 125 choices (depending on _cost and _rating).

Each 9 column combination (row) has 2 calculated values: TotCost (all of the individual col##_cost added together), and TotRating (all of the col##_ratings added together).

Also as noted before, pos2a and pos2b cannot be the same (hence WHERE pos2b.pos2b <>[pos2a] ). Same applies to pos3a-c and pos5

INSERT INTO target_table ( pos1, pos2a, pos2b, pos3a, pos3b, pos3c, pos4, pos5, pos6, TotCost, TotRating, RowKey )
SELECT pos1.pos1, pos2a.pos2a, pos2b.pos2b, pos3a.pos3a, pos3b.pos3b, pos3c.pos3c, pos4.pos4, pos5.pos5, pos6.pos6, [pos1_cost]+[pos2a_cost]+[pos2b_cost]+[pos3a_cost]+[pos3b_cost]+[pos3c_cost]+[pos4_cost]+[pos5_cost]+[pos6_cost] AS TotCost, [pos1_rating]+[pos2a_rating]+[pos2b_rating]+[pos3a_rating]+[pos3b_rating]+[pos3c_rating]+[pos4_rating]+[pos5_rating]+[pos6_rating] AS TotRating, fxMakeKey([pos1],[pos2a],[pos2b],[pos3a],[pos3b],[pos3c],[pos4],[pos5],[pos6]) AS Expr1
FROM pos1, pos2a, pos2b, pos4, pos5, pos3a, pos3b, pos3c, pos6
WHERE (((pos2b.pos2b)<>[pos2a]) AND ((pos3b.pos3b)<>[pos3a] AND (pos3b.pos3b)<>[pos3c]) AND((pos3c.pos3c)<>[pos3a] AND (pos3c.pos3c)<>[pos3b]) AND ((pos5.pos5)<>[pos2a] AND (pos5.pos5)<>[pos3b] AND (pos5.pos5)<>[pos3c] AND (pos5.pos5)<>[pos4]) AND(([pos1_cost]+[pos2a_cost]+[pos2b_cost]+[pos3a_cost]+[pos3b_cost]+[pos3c_cost]+[pos4_cost]+[pos5_cost]+[pos6_cost])<500) AND(([pos1_rating]+[pos2a_rating]+[pos2b_rating]+[pos3a_rating]+[pos3b_rating]+[pos3c_rating]+[pos4_rating]+[pos5_rating]+[pos6_rating])>25) AND((([pos5].[pos5])<>[pos2b])<>[pos3a]));


#7

It's not a function. If you want it separately as a function we can do that; an in-line table-valued function would be best for performance.

I hard-coded the values simply to be able to do testing. You did not provide any test data in a usable format! Replace the "FROM ( ... )" with "FROM your_table_name".

Remove the '|' + from the inner SELECT.


#8

OK, I think I see now. I will give it a shot and report back. Thank you!