SQLTeam.com | Weblogs | Forums

Create View that Collapses Near Duplicate Rows into One Row from One Table


#1

If you will forgive me a for a second question this week, I hadn't expected the need to do this but it turns out the MSFT Visio is very limited on linking external data so I need to compress some data that is in a table.

My singular table is created as such:

USE [VISIO]
GO

/****** Object:  Table [dbo].[tbl_Raw_NIC_Imported]    Script Date: 6/30/2015 2:47:18 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[tbl_Raw_NIC_Imported](
    [MACAddress] [varchar](50) NOT NULL,
    [Host] [varchar](255) NOT NULL,
    [IPAddress] [varchar](50) NOT NULL,
    [IPSubnet] [varchar](50) NULL CONSTRAINT [DF_tbl_Raw_NIC_Imported_IPSubnet]  DEFAULT (''),
    [DefaultIPGateway] [varchar](50) NULL CONSTRAINT [DF_tbl_Raw_NIC_Imported_DefaultIPGateway]  DEFAULT (''),
    [DNSServer1] [varchar](50) NULL CONSTRAINT [DF_tbl_Raw_NIC_Imported_DNSServer1]  DEFAULT (''),
    [DNSServer2] [varchar](50) NULL CONSTRAINT [DF_tbl_Raw_NIC_Imported_DNSServer2]  DEFAULT (''),
    [WINSPrimary] [varchar](50) NULL CONSTRAINT [DF_tbl_Raw_NIC_Imported_WINSPrimary]  DEFAULT (''),
    [WINSSecondary] [varchar](50) NULL CONSTRAINT [DF_tbl_Raw_NIC_Imported_WINSSecondary]  DEFAULT (''),
    [InterfaceName] [varchar](255) NULL CONSTRAINT [DF_tbl_Raw_NIC_Imported_InterfaceName]  DEFAULT (''),
 CONSTRAINT [PK_IPAddress1] PRIMARY KEY CLUSTERED 
(
    [IPAddress] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

The table is populated with data similar to this sample data which is useful to highlight my question:

MACAddress,Host,IPAddress,IPSubnet,DefaultIPGateway,DNSServer1, DNSServer2,WINSPrimary,WINSSecondary,InterfaceName 
00:50:56:B6:00:92,svr1.my.local,10.191.240.112,255.255.255.0,10.194.240.2,10.10.100.202,10.10.100.200,,, Intel(R) PRO/1000 MT Network Connection 
00:50:56:B6:00:92,svr1.my.local,10.191.240.114,255.255.255.0,10.194.240.2,10.10.100.202,10.10.100.200,,,Intel(R) PRO/1000 MT Network Connection 
00:50:56:B6:00:92,svr1.my.local,10.191.240.115,255.255.255.0,10.194.240.2,10.10.100.202,10.10.100.200,,,Intel(R) PRO/1000 MT Network Connection 
00:50:56:B6:00:92,svr1.my.local,fe80::ed77:187d:80af:5e4d,64,10.194.240.2,10.10.100.202,10.10.100.200,,,Intel(R) PRO/1000 MT Network Connection 
00:50:56:B6:00:95,svr1.my.local,192.168.0.3,255.255.255.0,192.168.0.1,,,,,Intel(R) PRO/1000 MT Network Connection #2
EE:50:56:B6:00:95,svr2.my.local,192.168.0.4,255.255.255.0,192.168.0.1,,,,,Intel(R) PRO/1000 MT Network Connection

What I would like to accomplish is to have a view that compresses rows that have multiple IP addresses assigned to a given adapter. I would also like to exclude the IPv6 addresses. Based off of the sample data, the resulting view would have rows that look like this:

MACAddress,Host,IPAddress,IPSubnet,DefaultIPGateway,DNSServer1, DNSServer2,WINSPrimary,WINSSecondary,InterfaceName 
00:50:56:B6:00:92,svr1.my.local,10.191.240.112 | 114 | 115,255.255.255.0,10.194.240.2,10.10.100.202,10.10.100.200,,, Intel(R) PRO/1000 MT Network Connection 
00:50:56:B6:00:95,svr1.my.local,192.168.0.3,255.255.255.0,192.168.0.1,,,,,Intel(R) PRO/1000 MT Network Connection #2
EE:50:56:B6:00:95,svr2.my.local,192.168.0.4,255.255.255.0,192.168.0.1,,,,,Intel(R) PRO/1000 MT Network Connection

Where the first 3 records are collapsed into one and the IPAddress shows the 3 IPs. The fourth record isn't present and the fifth is. I have see lots of examples of concat and XML usage but they don't seem to fit the need or I haven't been able to adapt them.

This came the closest to anything and, oh, ARG, this won't work in a view anyway it seems even though it produces the output that I need.

SELECT 
   Source.[MACAddress],
   UPPER(Source.[Host]),

   STUFF((SELECT ' | ' + Changed.[IPAddress] 
    FROM [VISIO].[dbo].[tbl_Raw_NIC_Imported] Changed
    WHERE (((Changed.[MACAddress] = Source.[MACAddress]) AND (Changed.[Host] = Source.[Host])) AND (Changed.[IPAddress] NOT LIKE '%:%'))
    FOR XML PATH('')), 2, 1, '') [COMBINED IP],

   Source.[IPSubnet],
   Source.[DefaultIPGateway],
   Source.[DNSServer1],
   Source.[DNSServer2],
   Source.[WINSPrimary],
   Source.[WINSSecondary],
   Source.[InterfaceName]

FROM [VISIO].[dbo].[tbl_Raw_NIC_Imported] Source
WHERE Source.[IPSubnet] <>'64'
GROUP BY Source.[MACAddress], Source.[Host], Source.[IPSubnet], Source.[DefaultIPGateway], 
         Source.[DNSServer1], Source.[DNSServer2], Source.[WINSPrimary], Source.[WINSSecondary], Source.[InterfaceName]
ORDER BY 1

Any insight on how to do this same thing in a view would be much appreciated.

Thanks,

Casey


#2

Hi Casey,

If you remove the

ORDER BY 1

It should allow you to create the view as needed.

If the ordering is important you can do it when selecting from the view. Also, when adding an ordering consider using the name of the column your ordering by rather than ordering by ordinal, as its not as explicit as to what you're ordering and if the column order changes you may end up ordering by something else by mistake.

ORDER BY Source.[MACAddress] ASC

#3

Also, you've filtered out

WHERE Source.[IPSubnet] <>'64'

To remove the IPv6 addresses, will this be possible in your wider query?

If you need to check if the IP addresses are in the correct format, something like this may work (I'm very sure there are more efficient methods than this though)

SELECT * FROM dbo.tbl_Raw_NIC_Imported AS A
WHERE PARSENAME(A.IPAddress,1) LIKE '%[0-9]%'
AND PARSENAME(A.IPAddress,2) LIKE '%[0-9]%'
AND PARSENAME(A.IPAddress,3) LIKE '%[0-9]%'
AND PARSENAME(A.IPAddress,4) LIKE '%[0-9]%'

#4

Dohsan,

Thank you for all your feedback. It makes me want to post my main SP that's feeding the tables just to see what could be done better. I removed the ORDER BY clause and found that I also needed to name the second column as the UPPER created an issue. I was then able to create the view.

I guess I could have filtered out the IPv6 before I got to this. The thing is, I'm not 100% sure that I am good to go. I need to really take a hard look at the results once I populate significant data. Before I filtered on 64, I was getting duplicate rows. I'm left with the question of what will happen if any other column has a difference. The 64 was a difference in the IPSubnet column and it created a duplicate row where found. I don't have a firm enough understanding of what I did in the STUFF statement to be sure, so, I will need to spend some time with it. I am a visual person so sometimes I need results to feel good about what I have done in theory.

Thanks again for spot checking this.

Casey


#5

@Casey I think you have it mostly sorted, my advice would be to always create some test data and test any code you're unsure of the behaviour of.

As for the concat with other columns see below:

DECLARE @Test TABLE
(
	ID INT NOT NULL,
	RanChar VARCHAR(5) NOT NULL,
	OtherCol VARCHAR(5) NOT NULL
);

INSERT INTO @Test (ID,RanChar,OtherCol)
VALUES	(1,'A1','X'),
		(1,'A2','X'),
		(1,'A3','Y'),
		(2,'B1','X'),
		(2,'B2','X'),
		(2,'B3','X');
		
SELECT	T.ID,
		Concat = STUFF((	SELECT ' | ' + T1.RanChar
				FROM	@Test AS T1
				WHERE	T1.ID = T.ID 
				FOR XML PATH('')), 2, 1, '')
FROM	@Test AS T
GROUP	BY T.ID;

--Other Col with variable values
SELECT	T.ID,
		Concat = STUFF((	SELECT ' | ' + T1.RanChar
				FROM	@Test AS T1
				WHERE	T1.ID = T.ID 
				FOR XML PATH('')), 2, 1, ''),
		T.OtherCol
FROM	@Test AS T
GROUP	BY T.ID,
		T.OtherCol;