How to optimise unused columns (and their tables) in a view

Kristen · January 11, 2016, 5:15pm

If I have a VIEW which has an Outer Join to another table, but the columns in the VIEW which reference that table are not used is there a way to a) cause that table to be ignored (i.e. optimised out of the query) or b) to rewrite the VIEW so that that happens?

In the example below my TEMP_VIEW2 uses a nested-select, in the SELECT clause, to get the associated data, and that IS optimised out of the query. BUT ... that's not going to work very well if I want the VIEW to present more than one column form a JOINed table.

I haven;t looked at whether CTEs and the like would solve the problem, I thought I would ask first in case anyone knew

Thanks

USE tempdb
GO
CREATE TABLE TEMP_PRODUCT
(
	TP_ID	int,
	TP_Category	varchar(10),
	TP_Price	float
) 

INSERT INTO TEMP_PRODUCT
SELECT 1, 'CAT1', 200.00 UNION ALL
SELECT 2, 'CAT1', 200.00 UNION ALL
SELECT 3, 'CAT1', 207.00 UNION ALL
SELECT 4, 'CAT2', 576.00 UNION ALL
SELECT 5, 'CAT2', 801.20 UNION ALL
SELECT 6, 'CAT3', 897.60 UNION ALL
SELECT 7, 'CAT3', 876.80 UNION ALL
SELECT 8, 'CAT4', 876.80 UNION ALL
SELECT 9, 'CAT4', 432.20 UNION ALL
SELECT 10, 'CAT4', 432.20 

GO
CREATE TABLE TEMP_CATEGORY
(
	TC_ID	int,
	TC_Category	varchar(10),
	TC_Descripion	varchar(30)
) 

INSERT INTO TEMP_CATEGORY
SELECT 1, 'CAT1', 'Category One' UNION ALL
SELECT 2, 'CAT2', 'Category Two' UNION ALL
SELECT 3, 'CAT3', 'Category Three' UNION ALL
SELECT 4, 'CAT4', 'Category Four'
GO

CREATE VIEW TEMP_VIEW
AS
SELECT	TP_ID,
	TP_Category,
	TP_Price,
	--
	TC_ID,
--	TC_Category,
	TC_Descripion
FROM	TEMP_PRODUCT AS T1
	LEFT OUTER JOIN TEMP_CATEGORY AS TC
		 ON TC_Category = TP_Category
GO


CREATE VIEW TEMP_VIEW2
AS
SELECT	TP_ID,
	TP_Category,
	TP_Price,
	--
	[TC_ID] =
	(
		SELECT	TC_ID
		FROM	TEMP_CATEGORY AS TC
		WHERE	TC_Category = TP_Category
	)
--	TC_Category,
--	TC_Descripion
FROM	TEMP_PRODUCT AS T1
GO


-- TEST RIG:

PRINT '#1 Reference both tables, VIEW#1'
SELECT	TP_ID, TC_ID
FROM	TEMP_VIEW

PRINT '#2 Reference primary table only, VIEW#1'
SELECT	TP_ID
FROM	TEMP_VIEW


PRINT '#3 Reference both tables, VIEW#2'
SELECT	TP_ID, TC_ID
FROM	TEMP_VIEW2

PRINT '#4 Reference primary table only, VIEW#2'
SELECT	TP_ID
FROM	TEMP_VIEW2

GO
DROP TABLE TEMP_PRODUCT
GO
DROP TABLE TEMP_CATEGORY
GO
DROP VIEW TEMP_VIEW
GO
DROP VIEW TEMP_VIEW2
GO

#1 Reference both tables
Table 'TEMP_CATEGORY'. Scan count 1, logical reads 10
Table 'TEMP_PRODUCT'. Scan count 1, logical reads 1

|--Nested Loops(Left Outer Join, WHERE:([tempdb].[dbo].[TEMP_CATEGORY].[TC_Category] 
	as [TC].[TC_Category]=[tempdb].[dbo].[TEMP_PRODUCT].[TP_Category] as [T1].[TP_Category]))
|--Table Scan(OBJECT:([tempdb].[dbo].[TEMP_PRODUCT] AS [T1]))
|--Table Scan(OBJECT:([tempdb].[dbo].[TEMP_CATEGORY] AS [TC]))


#2 Reference primary table only (same)
Table 'TEMP_CATEGORY'. Scan count 1, logical reads 10
Table 'TEMP_PRODUCT'. Scan count 1, logical reads 1

#3 Reference both tables (same)
Table 'TEMP_CATEGORY'. Scan count 10, logical reads 10
Table 'TEMP_PRODUCT'. Scan count 1, logical reads 1

#4 Reference primary table only
Table 'TEMP_PRODUCT'. Scan count 1, logical reads 1

  |--Table Scan(OBJECT:([tempdb].[dbo].[TEMP_PRODUCT] AS [T1]))

jcelko · January 12, 2016, 2:02am

This is a bit of mess. Your data types are wrong, there is no DRI, etc. Follow industry and ANSI/ISO standards in your data. You should follow ISO-11179 rules for naming data elements.

Did you know that using float for money is illegal? That tables need keys? That numeric are newer used for identifiers? (what math do you do with them?) And that we use industry standards whenever possible (I like GRIN, but what is appropriate in your industry?)

CREATE TABLE Products
(gtin CHAR(15) NOT NULL PRIMARY KEY,
product_cat CHAR(5) NOT NULL
REFERENCES product_categories (product_cat)
ON DELETE CASCADE
ON UPDATE CASCADE,
unit_price DECIMAL (8,2) NOT NULL
CHECK (unit_price > 0.00)
);

Why don't you know ANSI/ISO syntax for insertions?

INSERT INTO Products
VALUES
('01', 'CAT1', 200.00),
('02', 'CAT1', 200.00),
('03', 'CAT1', 207.00),
('04', 'CAT2', 576.00),
('05', 'CAT2', 801.20),
('06', 'CAT3', 897.60),
('07', 'CAT3', 876.80),
('08', 'CAT4', 876.80),
('09', 'CAT4', 432.20),
('10, 'CAT4', 432.20);

Why did you create a redundant “TC_ID”? You do not know what a key is, do you? You are building 1960's pointer chains with integers, and not writing SQL at all!

CREATE TABLE Product_Categories
(product_cat CHAR(5) NOT NULL PRIMARY KEY,
product_description VARCHAR(30) NOT NULL);

INSERT INTO Product_Categories
('CAT1', 'Category One'),
('CAT2', 'Category Two'),
('CAT3', 'Category Three'),
('CAT4', 'Category Four');

If the list of categories is small, static and the encoding is obvious in your industry, then use a “CHECK (product_cat IN (..))” in the Products table and not the DRI I just showed you.

Your view is a mess. Why use an OUTER JOIN when the DRI will guarantee a match? Again, you still think that a table is file and do not understand that the unit of work is the whole schema.

CREATE VIEW Display_Products
AS
SELECT P1.gtin, P1.product_category, P1.unit_price,
C1.product_description
FROM Products AS P1,
Product_Categories AS C1
WHERE C1.product_cat = T1.product_cat;

This is how noobs often write SQL the first 6-8 months of employment. Find someone in your shop who can mentor you and help you un-learn old habits.

Kristen · January 12, 2016, 6:44am

Hahaha ... you crack me up Celko preaching to everyone about standards.

You think this is code for a real world situation? This is just a trivial example for the purpose of anyone trying to answer my question who wants to have some sample data to work with to save them the time of creating their own. Its lifted from the first SQL script I could find lying around which could be adapted as it is not worth my time writing a carefully crafted example from scratch. Everyone here, except you, would be pleased that I had provided an example to save them typing your own.

Instead of spending several minutes crafting a reply critical of my example code the one thing you COULD have helpfully done would have been to answer my question. That would have been really helpful, and with your knowledge you might well have been able to do that ... I notice none of the regulars here have, as yet, had a go at an answer.

... although, as with all your answers, you are so wrapped up in theory that you rarely provide any answers that are actually useful in the Real World, so if you did give me an answer I doubt it would actually be useful to me (back in the Real World).

However, let's look on the bright side, I've never had an answer to one of my questions from the Great Man himself, so this:

I will frame and hang in the toilet.

TaraKizer · January 12, 2016, 4:43pm

I recently came across this scenario at my last job. It had a correlated subquery in the SELECT list of the view. The calling report took almost 30 minutes to run, and it was due to the correlated subquery. The calling report wasn't even using that column (from the correlated subquery)! My recommended solution was to create a duplicate view without that column and have the report call the new view. This solution allowed the report to return in under a minute.

Not an ideal solution obviously, more of a workaround. That's the problem with views when you are adding in every single column you think the calling code might ever need.

Another option that I posed to the team was to remove the column/correlated subquery from the view and instead have it be added to the calling code if they need that column returned.

I am not sure if there is a better solution. Seems like it would have to be worked into the query optimizer to know to not process that bit of code when the column isn't used.

Gosh I love Joe Celko's replies. So unprofessional and useless. But they do make me chuckle.

ScottPletcher · January 12, 2016, 4:51pm

Not necessarily! We have promotional materials that we include free in some orders, and they do indeed have a zero price. It's also possible to give away products that are normally sold, such as when pet vendors give you a free sample dog biscuit or bone with an order.

James · January 12, 2016, 6:35pm

Out of curiosity, where is this law? Does it pertain to international or US SEC requirements? I am a big fan of not spending time in prison and would enjoy being able to point to specifics when I am asked about the law.

Thanks!

Kristen · January 12, 2016, 7:14pm

Thanks Tara. We tend to have a number of Code Lookup columns in a table, and want to display the Description for those Codes in reports, so I was trying to have a VIEW that avoiding having to repeat all the JOINs in every report a) in case they ever changed, although that is unlikely and b) because sooner or later someone will mis-type the JOIN [in the SQL for a report] and we'll get a bug ... perhaps, even, a subtle bug, and they are expensive to fix.

[quote]Seems like it would have to be worked into the query optimizer to know to not process that bit of code when the column isn't used.
[/quote]

Yes, I was clearly both Dreaming and Hoping! I used an OUTER JOIN (even though its a required FKey in practice), so that the JOINed table wasn't part of the selection criteria, in the hope that it might be optimised-out (until it was found needed in the SELECT or ORDER BY etc. No Such Luck

I'll do some performance tests on my work-around of having a sub-query within the SELECT clause; for the Code Lookups 99% of the time we only want the, one, column for the description, so JOINing the table would have no benefit (other than performance). But for a Child Table our VIEWs get all columns from the associated Parent Table, and that's definitely going to need a JOIN.

If I can't improve on what I have I'll stick with it and worry about it if the performance becomes a problem. Having all the JOINs "documented" within the VIEW saves lots of work, and potential bugs etc. in SO many places in the cod base.

Nice to see his old style JOIN making a comeback, eh?!

FROM Products AS P1,
Product_Categories AS C1
WHERE C1.product_cat = T1.product_cat

jcelko · January 12, 2016, 8:42pm

I have seen zero prices on things, too. But I have also seen a compensating discount for products that are given away. I guess the story is you actually have to go down and talk to the accounting department to see what the business rules are. ARRRGH! My objection to a zero price is that even coupons have to be valued at the 10th of a penny for legal reasons.

ScottPletcher · January 12, 2016, 8:56pm

Right, coupons have a nominal value, not a nominal price. And things such as brochures that are given away don't even need that.

If you want to normalize the table at all, certainly the product_cat at least should be encoded to a number as well.

jcelko · January 12, 2016, 9:04pm

Nice to see his old style JOIN making a comeback, eh?!

The set oriented join is still valid SQL. What you seem to mean is the infixed joins. I need to sit down and write an article on the history of the infixed notation, since I was on ANSI X3H2 when we voted them in, and read the original papers.

This was inspired by an article the Chris Date did for one of the newsstand database magazines in those days about outer joins. I cannot remember who submitted the proposal , but Chris Date never sent anything to the SQL Standards committee. There was actually a lot more to it than simply left, right and full outer joins. We had extensions to the on clause (corresponding, natural), outer union and matching options.

It was so complicated nobody implemented it completely. But as Dave McGovern said, "a committee never met a feature, it did not like."

What I found over the decades of repairing bad SQL is that people who use infixed inner join syntax are like people who think in terms of +'s instead of capital Sigma notation for summations. Their mindset is still locked in a sequence of operations and not a set oriented view of data. It is not. It is not wrong, but it shows how someone thinks. It is a "code smell" that I found very handy. I know that he will confuse tables with sequential files, he will fail to use views or CTE's (files are not virtual like tables!), will use if–then–else control flow logic instead of using a case expression, etc.

jcelko · January 12, 2016, 9:21pm

If you want to normalize the table at all, certainly the product_cat at least should be encoded to a number as well.

The datatype has nothing to do with normalization! A simple two column lookup table is already in Fifth Normal Form (5NF).

However, datatypes do have to do with data modeling. A category system is a nominal scale; nominal scales cannot be used in computations; therefore, they are not represented with numerics. However, assembly language programmers want to want a category code to look like a pointer; pointers are usually modeled as integers in old hardware.

If there is no industry standard for the category system, then I like abbreviation codes. The advantages are that an abbreviation is human readable without translation, short (3 to 4 letters handles most things, so one word of storage holds the encoding), and the list is usually short enough to fit into a "CHECK (col IN )" constraint. Since most modern SQLs can handle hundreds or even thousands of items in such a predicate, this is not a problem. Think about the two letter state abbreviation code; it has 50 states, the territories and the military post offices, and it has been extended to the provinces of Canada.

This constraint is also passed directly to the optimizer, while a references clause requires a little more work.

ScottPletcher · January 12, 2016, 9:35pm

Encoding string values so that they are not repeated obviously does have something to do with normalization, since product categories could easily be renamed. State abbreviations will not be changed.

The value can be encoded and still not require a separate table, albeit at more CPU cost when the value is needed. But for a very short list, particularly one that's not read all that often, it would work fine.

jcelko · January 12, 2016, 9:42pm

Out of curiosity, where is this law? Does it pertain to international or US SEC requirements? I am a big fan of not spending time in prison

In the US, we have GAAP (Generally Accepted Accounting Principles) rules for rounding, interest, and other financial calculations. I have not worked with stocks in a long time, and when I was, I was doing stats with prices in 1/8 dollars, (yes, I am old ) which made for weird math.

In the US, the standard had been five decimal places. I am; I am not sure what it is today. Probably six. The rules were set up for COBOL programming, and in those days there were no IEEE floating-point standards. IBM, Univac, Burroughs and CDC all had slightly different implementations based on their physical hardware. They also had different adjustments for rounding and correcting for floating-point error. This is why floating-point was illegal.

In the European Union, we have Rule 1103/97, which is for calculating conversions to euros. This rule also established convertibility to six, then three, decimal places and the adoption of triangulation as the legal norm for transacting business in the Eurozone.

I always tell people that are doing financial work, they need to talk to the accounting department. I think the accounting department got a bad name from old Dilbert cartoons, in which he referred to "the trolls in accounting", whose job is to deny all of your travel expenses, anything you want to buy. And of course any employee pay raises. Honest guys, they are not the enemy!

James · January 12, 2016, 9:49pm

Thanks for the clarification!

ScottPletcher · January 12, 2016, 10:11pm

Yes, and GAAP requires at least four decimal places for doing calcs. Not coincidentally, SQL's money data type has -- who'd have thunk it -- four decimal places.

jcelko · January 12, 2016, 10:56pm

I will frame and hang in the toilet.

That is actually an excellent place for your code!

Professionals wrote good code as a reflex; you do not do this yet. You do not recognize your own mistakes. The code I posted is my idea of a quick skeleton. I know the table has to have a key, so my skeleton has one. I know certain basic industry standards that every IT professional should know, so I write them without thinking (how much thinking do you have to do about the common error calendar, the metric system, postal codes, etc.?) I know that RDBMS needs DRI (you might skip the ON DELETE/ ON UPDATE clauses in a skeleton). I know how NULLs work. Etc,

If this was a woodworking forum, you would be the poster asking for the best kinds of rock to use to pound screws into furniture. You are asking the wrong questions.

Kristen · January 13, 2016, 8:09am

Of course I do. You have no idea what my code looks like as you are basing your view on the example which I posted, and I have already told you how that was thrown together.

If you have an answer to my original question I'd be grateful. If all you want to do is wander around criticising and being rude to people go somewhere else.

Such a talent, wasted.

gbritton · January 13, 2016, 2:07pm

Kristen, I recall a related problem I solved with a CTE-based partitioned view. I had a sproc that took a parameter indicating if daily, weekly or monthly results were desired. The CTE looked like:

with source as
(
select ...
from daily
where @parm = 'daily'
union all
select ...
from weekly
where @parm = 'weekly'
union all
select ...
from monthly
where @parm = 'monthly'
)


select ...
from source

IIRC doing this caused the unneeded sources to be effectively pruned. Maybe something like that will do it for you.

Kristen · January 14, 2016, 9:29am

Thanks GB. I have had a fiddle with CTEs but I can't figure a way to get a query plan any different to just JOINing the tables. I can see how the parameters in your SProc provides sufficient hints for the optimiser, but in a VIEW I can't find a way to trigger the optimiser's help

My only other thought is some sort of INCLUDE file. Pity that SQL doesn't support that as it would give me better code re-usability. We are currently planning on writing our own Batch Job to substitute replacement code for all UDF usage (to improve performance in Production, whilst allowing UDFs in DEV for speed. My plan, in doing that, is to have a single batch-job which will run when deploying from DEV to TEST (and we have several batch jobs at that point anyway, e.g. compression of all source code to remove white space & comments etc.) and, for us, DEV to TEST is not an everyday occurrence, so the Batch Job time is trivial and I could easily add an Include File method at that point too. But if I had to do an extra step for the Include File code-modification every time I ran an SProc Script that might well be a major annoyance ...

ScottPletcher · January 14, 2016, 4:08pm

You can use an inline-table-valued function in place of a view. SQL evaluates those to bring them in line in the executable code, including optimizing based on actual values instead of estimates.