Help with some complex operations

sqlor · December 11, 2020, 11:50am

Hello

Can you provide please a high level help for the below operations (the functions needed and some logic would be enough and I will try to write the code myself).

How can I create a table from another table where the rows that have the same value in Col1, will be converted into one row and all the different values in the rest fields will be appended as Value1 | Value2 etc.

Example:
A, 13, 52
A, 25, 52
Will become:
A, 13 | 25, 52

How can I identify the columns that have different values among the rows where Col1 has the same values?

Example:
A, 13, 52
A, 25, 52
Should return:
Col2

Are there any tutorials to guide me through the process of extracting the data from a system and creating historic records along with its current records?

Thanks!

mike01 · December 11, 2020, 2:56pm

The first one you can use Concat to concatenate them together. The second, you could group by column2 having count(1) > 1. Why would you want to concatenate these columns together?

JeffModen · December 12, 2020, 5:02am

What version of SQL Server are you using?

For #1 and If it's 2017 or above, you could use the STRING_AGG function. If it's 2016 or less, you'll likely be relegated to using FOR XML PATH with a GROUP BY to do STRING CONCATENATION. The logic will hit you once you understand those things.

For #2, your going to need to know a whole bunch of things like how to use a CROSS APPLY to unpivot data which includes knowledge of TABLE VALUE CONSTRUCTORS, GROUP BY with HAVING, and COUNTing DISTINCT values in a column. If you want it to figure out what's in a table just by pointing a proc at a table, you're going to need to know about sys.columns and dynamic SQL which includes EXEC and sp_executeSQL. The logic for the non-dynamic solution is to use CROSS APPLY to unpivot the columns and column names into an NVP (Name Value Pair) structure and then count the number of distinct occurrences by Name filtering for those Names that have a count of distinct values > 1..

For #3, search for TEMPORAL TABLES, CDC, AUDIT TRIGGERS, and auditing data changes in SQL Server.

Your turn, please. Where did questions 1 and 2 come from? These are not typical questions that anyone would even think to ask unless a specific requirement was given.

yosiasz · December 12, 2020, 3:21pm

#1

declare @sqlfor table(col1 char(1), col2 int, col3 int)

insert into @sqlfor
select 'A', 13, 52 union
select 'A', 25, 52 union
select 'A', 52, 33 union
select 'B', 52, 33 

--Will fail on B because it is a solitary row with no comparison row
--what happens in that case?
--what happens if same row A 
--has 3 rows as above

select a.col1 + ',' + cast(_min as varchar(50)) 
+ '|' + cast(b.col2 as varchar(50)) + ',' + cast(b.col3 as varchar(50))
  from (
			select col1, min(col2) _min
			  From @sqlfor
			  group by col1
       ) a
join @sqlfor b on a.col1 = b.col1
where b.col2 <> a._min

JeffModen · December 12, 2020, 4:22pm

@sqlor

Although it seems implied in the description for problem #1 that the "rules" should apply to each distinct value of Col1, the sample data that @yosiasz provided shows that a bit of clarification is needed from my point of view as well as the question he asked. For example, what do you specifically want done if Col3 has more than one distinct value for any distinct value of Col1? Should there be a separate row or should there still be a single row where column 3 has a string aggregation similar to Col2?

{EDIT} Actually, never mind. Your requirement was pretty clear when you said

JeffModen · December 12, 2020, 4:26pm

@yosiasz

Both problems 1 and problem 2 can be solved without a self join even in versions less than 2017. But the OP wants to figure it out on his own (and that's actually a good thing because most others want a spoon-feeding) so I'm not going to post any code for now.

ScottPletcher · December 13, 2020, 12:59am

I believe #2 can be done without any of: CROSS APPLY, table value constructors, COUNTing DISTINCT values and HAVING. Instead you just need two or three basic GROUP BY functions like COUNT, etc..

JeffModen · December 13, 2020, 1:57am

Stop it with the code! The OP said he wanted to try it himself!

ScottPletcher · December 13, 2020, 2:13am

Edit: Oops, SORRY, I apologize. I just reread the comments and you are correct, OP did say they wanted to write the code themselves.

I should have just suggested the approach rather than completely filled it out. I will edit the code accordingly.

yosiasz · December 13, 2020, 2:20am

select 1

sqlor · December 13, 2020, 11:20am

Thanks!

So it seems that 1) is easily done with STRING_AGG and 2) is much more complex.

So my attempt for 1) would be:
STRING_AGG ([Col2], '|') WITHIN GROUP ([Col1])

Is that correct?

PS: Unfortunately I do not have write access to the SQL server to be able to create a table and test it myself.

jeffw8713 · December 13, 2020, 5:19pm

You should have the ability to create a temp table or table variable - those permissions are available to all users.

With that said...you should look at downloading and installed SQL Server Developer Edition and installing the features you want to work with and learn. This is free for development work and will install on your local workstation - and you can find several databases to add to the installation for testing.

ScottPletcher · December 13, 2020, 7:32pm

JeffModen:

For #2, your going to need to know a whole bunch of things like how to use a CROSS APPLY to unpivot data which includes knowledge of TABLE VALUE CONSTRUCTORS, GROUP BY with HAVING, and COUNTing DISTINCT values in a column. If you want it to figure out what's in a table just by pointing a proc at a table, you're going to need to know about sys.columns and dynamic SQL which includes EXEC and sp_executeSQL. The logic for the non-dynamic solution is to use CROSS APPLY to unpivot the columns and column names into an NVP (Name Value Pair) structure and then count the number of distinct occurrences by Name filtering for those Names that have a count of distinct values >

Your turn, please. Where did questions 1 and 2 come from? These are not typical questions that anyone would even think to ask unless a specific requirement was give

I've rethought posting the solution to #2. I don't think any inexperienced SQL coder will come up with a similar solution. You have massive experience and expertise in SQL and you didn't consider this approach. And the code itself has some points of interest.

This code ignores NULL values for a column. Say, for example, that for col1 = 'C', col2 have values of 25 and NULL (and possibly 1+ more NULLs). This code still lists col2 as having only a single value (i.e. a single non-NULL value). If you want to count NULL(s) as distinct values, we'd have to make other changes to the code.

--#2 (determine which other columns have more than 1 unique value for a given value of col1)
SELECT
    col1,
    STUFF(
    CASE WHEN COUNT(*) >= 2 AND MIN(col2) <> MAX(col2) THEN ', col2' ELSE '' END +
    CASE WHEN COUNT(*) >= 2 AND MIN(col3) <> MAX(col3) THEN ', col3' ELSE '' END 
    /*+ ...as many other columns as needed...*/
    , 1, 2, '') AS cols_with_more_than_1_value
FROM @sqlfor
GROUP BY col1
ORDER BY col1

JeffModen · December 28, 2020, 12:12am

Since I didn't post any code, you don't actually kn0ow what the hell I was considering, Scott.

Since you decided to post code, post test data to go with it and I'll be happy to check it for accuracy.

ScottPletcher · December 28, 2020, 6:35am

You directly stated that it would take a "CROSS APPLY, TABLE VALUE CONSTRUCTOR(S), ... HAVING and COUNTing DISTINCT value". Based on that comment, I thought you believed the query would require all that. No reason to state that if you thought it could be done without any of those.

JeffModen · December 28, 2020, 8:14am

Oh... I didn't say I wasn't going to use those. You just don't have to be such a smarty pants about what my approach will be. You're also violating the ops request by posting code instead of ideas.

ScottPletcher · December 28, 2020, 3:21pm

Actually, and again, you explicitly stated earlier what you were going to use, and it did include those, as shown above.
The counting of distinct names is typically quite some overhead, so it's definitely best to avoid that if possible.

JeffModen · December 28, 2020, 5:41pm

Ok, so post a reasonable set of test data and let's have some fun racing. In the meantime, stop your cranking.

ScottPletcher · December 28, 2020, 7:10pm

I explained that. OP can ignore the code if they want.

ScottPletcher · December 28, 2020, 7:12pm

@Jeff:

Since I've already posted code for #2, you might as well post your code too, I would think.