View performance

sasanvm · February 11, 2017, 3:33pm

Hello

In an database I have a main table include million records, this table include duplicate records
that inserted when special event occurs.
Now I need to have an view from main table by eliminate duplicate records, I create 3 view:

view1: select all record from main table
view2: eliminate duplicates record by using group by
view3: select records from view1 where ID exist in view2 (eliminate duplicates record)

Seems all things is ok, view3 execute time is about 1 minute (not bad), but we have dynamic query
tools desinged for end user, when he/she try to create very simple query from view3 like:

"reg_date between date1 and date2 and num_count > 3",
is took long time and finally get "execution time out expire" ! more than 10 minutes ?

How can write above view better, what is wrong ?

Best Regards,
sasan.

-- view 1: [tbv_view1] select all record from main table

-- view 2:
if exists (select * from sysobjects where id = object_id('tbv_view2') and OBJECTPROPERTY(id, 'IsView') = 1)
DROP VIEW tbv_view2
GO
CREATE VIEW tbv_view2
AS
SELECT TOP 100 PERCENT MAX(ID) AS ID, ID_Number
FROM tbl_main
WHERE [status] <> 1 -- remove some events...
GROUP BY ID_Number
ORDER BY ID
GO

-- view 3: select records from view1 where ID exist in view2 (eliminate duplicates record)
if exists (select * from sysobjects where id = object_id('tbv_view3') and OBJECTPROPERTY(id, 'IsView') = 1)
DROP VIEW tbv_view3
GO
CREATE VIEW tbv_view3
AS
SELECT TOP 100 PERCENT * FROM tbv_view1
WHERE ID IN (SELECT ID FROM tbv_view2)
ORDER BY ID
GO

Kristen · February 11, 2017, 3:55pm

Don't use TOP 100 PERCENT and ORDER BY in Views, they won't work reliably (I don't know if that interferes with query optimiser, but maybe it is bad for that reason to)

Can you add a [IsDuplicate] column to Main Table, and set that to indicate whether it is the First record or a Duplicate? Then include a filter on [IsDuplicate] in VIEW3 (for the users) That will be much faster than what you currently have.

maybe this would work better:

CREATE VIEW tbv_view3
AS
WITH CTE_Duplicate
AS
(
	SELECT	[T_RowNumber] = ROW_NUMBER()
			OVER (
					PARTITION BY ID_Number
					ORDER BY ID_Number
							, ID DESC
			)
		, T.ID
		, T.ID_Number
		, T.Col1
		, T.Col2
		, ...
	FROM	dbo.tbl_main AS T
	WHERE [status] <> 1 -- remove some events...
)
SELECT
 	T.ID
	, T.ID_Number
	, T.Col1
	, T.Col2
	, ...
FROM	CTE_Duplicate AS T
WHERE	T_RowNumber = 1
GO

JeffModen · February 12, 2017, 9:55pm

The addition of such criteria will drastically change the execution plan... or at least it should. It may be "bad parameter sniffing" is paying you a visit or it may be trying to come up with a better execution plan and failed because most of the information it needs is only available at run time because of the aggregates in View 2. The only way you're going to be able to fix this is to do a deep dive on the estimated execution plan and see what's up. For example, the additional criteria may have caused the optimizer to create a many-to-many (Cartesian Product) causing View 2 to recalculate for every row in the table.

As to a much better View 2, which would likely solve this problem, consider converting it to an "Indexed View", which is a type of materialized view that changes only if the data changes instead of having to recalculate the whole bloody table every time someone calls the view.

And, yes... in this case, the TOP and ORDER BYs are only going to gum up the works. Get rid of the TOP and ORDER BYs.