Removing duplicate values in a single column

qman · August 17, 2015, 6:01pm

How can I removing duplicate values in a single column? For instance, let's say I have a table called Fruit that consists of a single row containing two columns, shown below.

ColA..........ColB
Apple.........Granny Smith, Empire, Delicious, Fiji, Empire, Delicious

I want the results of my query to eliminate the duplicate column values. Desired results shown below.

ColA..........ColB
Apple.........Granny Smith, Empire, Delicious, Fiji

djj55 · August 17, 2015, 6:44pm

In this case I would use a splitter on ColB.

Kristen · August 18, 2015, 11:28am

Problem with this approach is that it is not "normalised" .. so you get duplicates.

If you put [ColB] in a child table, instead, then you can easily enforce (at the database level) that duplicates are not allowed.

SO I would use a splitter on ColB and insert the distinct values into a child table and then get rid of ColB in the main table.

If the APP needs ColB as a comma-delimited list I would rename the main table (e.g. to MainTable_V2) and create a VIEW with the original "MainTable" name which combined the data from MainTable_V2 and ChildTable. I would then create an INSTEAD OF TRIGGER on MainTable_V2 so that anything that tried to insert/update ColB was automatically converted into data manipulation actions on ChildTable instead.

hhrr · August 18, 2015, 12:41pm

Hi

Please find ur solution

CREATE FUNCTION dbo.BreakStringIntoRows (@CommadelimitedString VARCHAR(1000))
RETURNS @Result TABLE (Column1 VARCHAR(100))
AS
BEGIN
	DECLARE @IntLocation INT

	WHILE (CHARINDEX(',', @CommadelimitedString, 0) > 0)
	BEGIN
		SET @IntLocation = CHARINDEX(',', @CommadelimitedString, 0)

		INSERT INTO @Result (Column1)
		--LTRIM and RTRIM to ensure blank spaces are   removed
		SELECT RTRIM(LTRIM(SUBSTRING(@CommadelimitedString, 0, @IntLocation)))

		SET @CommadelimitedString = STUFF(@CommadelimitedString, 1, @IntLocation, '')
	END

	INSERT INTO @Result (Column1)
	SELECT RTRIM(LTRIM(@CommadelimitedString)) --LTRIM and RTRIM to ensure blank spaces are removed

	RETURN
END
GO


SELECT DISTINCT A.*
	,b.*
INTO #SampleDataFinal
FROM #SampleData A
CROSS APPLY dbo.BreakStringIntoRows(A.colB) b

DECLARE @ConcatString VARCHAR(4000)

SELECT @ConcatString = COALESCE(@ConcatString + ', ', '') + Column1
FROM #SampleDataFinal

SELECT colA
	,@ConcatString
FROM #SampleData

drop table #SampleData
create table #SampleData
(
colA varchar(100) null,
colB varchar(500) null
)
insert into #SampleData select 'Apple','Granny Smith, Empire, Delicious, Fiji, Empire, Delicious'
select * from #SampleData

Kristen · August 18, 2015, 2:04pm

I very strongly recommend using a Splitter Function that is SET based rather than LOOP based. The difference in performance is massive - set-based could easily be 100x faster