How can I removing duplicate values in a single column? For instance, let's say I have a table called Fruit that consists of a single row containing two columns, shown below.
Problem with this approach is that it is not "normalised" .. so you get duplicates.
If you put [ColB] in a child table, instead, then you can easily enforce (at the database level) that duplicates are not allowed.
SO I would use a splitter on ColB and insert the distinct values into a child table and then get rid of ColB in the main table.
If the APP needs ColB as a comma-delimited list I would rename the main table (e.g. to MainTable_V2) and create a VIEW with the original "MainTable" name which combined the data from MainTable_V2 and ChildTable. I would then create an INSTEAD OF TRIGGER on MainTable_V2 so that anything that tried to insert/update ColB was automatically converted into data manipulation actions on ChildTable instead.
CREATE FUNCTION dbo.BreakStringIntoRows (@CommadelimitedString VARCHAR(1000))
RETURNS @Result TABLE (Column1 VARCHAR(100))
AS
BEGIN
DECLARE @IntLocation INT
WHILE (CHARINDEX(',', @CommadelimitedString, 0) > 0)
BEGIN
SET @IntLocation = CHARINDEX(',', @CommadelimitedString, 0)
INSERT INTO @Result (Column1)
--LTRIM and RTRIM to ensure blank spaces are removed
SELECT RTRIM(LTRIM(SUBSTRING(@CommadelimitedString, 0, @IntLocation)))
SET @CommadelimitedString = STUFF(@CommadelimitedString, 1, @IntLocation, '')
END
INSERT INTO @Result (Column1)
SELECT RTRIM(LTRIM(@CommadelimitedString)) --LTRIM and RTRIM to ensure blank spaces are removed
RETURN
END
GO
SELECT DISTINCT A.*
,b.*
INTO #SampleDataFinal
FROM #SampleData A
CROSS APPLY dbo.BreakStringIntoRows(A.colB) b
DECLARE @ConcatString VARCHAR(4000)
SELECT @ConcatString = COALESCE(@ConcatString + ', ', '') + Column1
FROM #SampleDataFinal
SELECT colA
,@ConcatString
FROM #SampleData
drop table #SampleData
create table #SampleData
(
colA varchar(100) null,
colB varchar(500) null
)
insert into #SampleData select 'Apple','Granny Smith, Empire, Delicious, Fiji, Empire, Delicious'
select * from #SampleData
I very strongly recommend using a Splitter Function that is SET based rather than LOOP based. The difference in performance is massive - set-based could easily be 100x faster