Is it OK to use select * in a CTE within a view?

danielems · July 28, 2015, 12:50pm

I know it's bad practice to use select * within a view, because the view doesn't update if new fields are added to the underlying table.

However, I have a view with a CTE that uses select *

Using the code below as an example, will I have the same problem if a new field is added to the HCRIS_HHAAbstract table in the CTE?


CREATE VIEW vProfile AS
WITH 
cy AS (

SELECT InstanceID, ha.*

FROM HCRIS_HHAAbstract ha

INNER JOIN vHomeHealthProfile_EndYear ha2

ON ha.MedicareID = ha2.MedicareID

AND YEAR(ha.FY_END_DT) = ha2.EndYear

)
SELECT
ginst.InstanceID,
ginst.AreaName,
hha.HomeHealthName,
cy.MedicareID

FROM
HomeHealthMaster hha INNER JOIN
vMedicareIDToHomeHealthMasterID vhha ON vhha.HomeHealthMasterID = hha.HomeHealthMasterID INNER JOIN
GIS_SysInstanceMasterIDs gmid ON gmid.MasterID = hha.HomeHealthMasterID INNER JOIN
GIS_SysInstances ginst ON ginst.InstanceID = gmid.InstanceID LEFT OUTER JOIN
cy ON cy.InstanceID = ginst.InstanceID AND cy.MedicareID = vhha.MedicareID

Kristen · July 28, 2015, 1:17pm

Doesn't it? That's news to me (but you might be right).

The reason not to use SELECT * is because you get all columns. You might need all columns today, but if I add a "Notes" column to the table, tomorrow, for the Call Centre to put reminders etc. then you will get that column selected too. I had a client who did exactly that; the Call Centre found ti tremendously helpful writing notes to each other so that anyone picking up the case later on could figure out what went on before ... very soon the average size of the Notes column was several K and within a month the whole application became unusable because every SELECT statement in the APP was a SELECT * - so it pulled the Notes column, every time, and the only thing that actually used it was the Data Entry form the Call Centre used.

List all the columns you need to use. Period. Anything is heading-for-a-fall

If you add a new column to the table re-visit all the code that uses that table and IF that code NEEDS the new column then add it to the list.

Other observations:

AND YEAR(ha.FY_END_DT) = ha2.EndYear

is inefficient because it has a Function applied to a Column. SQL will not be able to use an Index to speed up that process, so will have to scan the table instead. As/If the number of rows in the table grows significantly this may become a major bottleneck. (A workaround would be to have a "Year End Date" Table which had the Start/End date for the year, join to that and then check if ha2.EndYear is between the Start & End dates for the year.

Kristen · July 28, 2015, 1:18pm

P.S. Assuming you are right about SELECT * not refreshing in a VIEW if the Table changes could you use

EXEC sp_refreshview 'dbo.MyView'

to work around that?

danielems · July 28, 2015, 1:50pm

Kristen - thanks for the detailed reply. I appreciate the advice about the YEAR() function in the join - I might try adding a field to the table that holds the value of YEAR(ha.FY_END_DT) and include that in a index.

Regarding views not updating if a new field is added, it's true for SQL Server at least - http://stackoverflow.com/questions/262450/why-is-using-to-build-a-view-bad

It happened to us this week, but we were able to catch it before it caused any problems. I'm worried that there are other instances...

Kristen · July 28, 2015, 2:04pm

It should be possible to work out (i.e. Query SQL's Meta Tables) what the Dependencies are - i.e. which Tables are used in Which Views - and from that determine, for a given table, which Views (and Sprocs, Triggers, etc.) are dependent on that table. Perhaps then use [sp_refreshview] on those views ...

But I still maintain that if you change/add a column in a table that all code that references that table should be reviewed to see if it needs to be adjusted.

We use unique names for Tables and Columns, within the whole application (i.e. no columns called just plain "ID") which makes our life easy as we can search for "MyColumnName" and be sure we have a) found only things that are relevant and b) found all of them! Where we have a column referenced by another table (e.g. an OrderItem has a column for OrderHeader's ID, then the child table column name is a composite name which includes the parent name so ... if we search for "OrderHeader_ID" we also find [OrderItem_OrderHeader_ID] and chances are if we change [OrderHeader_ID] we will also need to change [OrderItem_OrderHeader_ID].

Either way, my advice is to never use "SELECT *" in your code (with the exception of an EXIST statement where the short-hand

WHERE EXISTS (SELECT * FROM MyTable WHERE ...)

allows SQL to choose any appropriate column / index to deduce if a matching row exists, or not.

gbritton · July 28, 2015, 2:14pm

Since your final SELECT is specific, I see no harm in using ha.* in the CTE. SQL will prune the columns not used when building the query plan. The other thing I don't mind so much in simple queries:

with cte as 
(
select a,b,c from mytable
)

select * from cte

since the columns are controlled by the cte. Still, best practice says "Explicit is better than implicit". Follow it!

Kristen · July 28, 2015, 2:27pm

True (although my guess is that the optimiser cannot be relied upon 100% to exclude all redundant columns ALL of the time?!!), but if that was the whole story presumably the O/P's VIEW wouldn't fall foul of the "Modified the table" issue? Or maybe changes to the underlying table make the view "unstable" (e.g. if the sequence of columns is changed, or a pre-existing column is renamed)

Either way, my preference would be to restrict the columns in the initial select within the CTE and then perhaps SELECT * FROM MyCTE would be acceptable.

Kristen · July 28, 2015, 2:29pm

Dunno how repliable it is, as back in the days of SQL2000 it was never 100% reliable, but assuming it has now become reliable

EXEC sp_depends 'MyTable'

will tell you what views etc. are dependent on MyTable, and they can be recreated / refreshed (or for Sprocs marked for recompilation) to take into account the changes to a table

Kristen · July 28, 2015, 2:33pm

P.S. You could use "WITH SCHEMABINDING" in the View ... that would not allow you to change the Table (i.e. cause you to sort-out the View at the same time), but of course ... WITH SCHEMABINDING won't allow you to use SELECT * - so its a Good Thing in my book

ScottPletcher · July 28, 2015, 3:25pm

Using SELECT * in a CTE within a view will have the same issue as a SELECT * directly against a table -- columns added after the view are created will not be listed when you query the view.

As to YEAR(), since that query very likely processes every row in the ha table anyway, it's more important to avoid any function or computation on the ha2 table, so that part of the query is correct as it is.

gbritton · July 28, 2015, 4:56pm

True but that's ok as used, since the final select is explicit. Still not best practice.

danielems · July 28, 2015, 4:57pm

First of all, thank you all for taking the time to help me! I had no idea about SCHEMABINDING. That would help a lot.

I'm getting conflicting answers from gbritton and ScottPletcher. Anyone have a reference for these answers?

I'll test it by adding a column to a table within a CTE and seeing if it is included or not. I'll share the test code and results with you.

I really don't want to to make this view more complicated by explicitly listing the columns within the CTE. I clipped the query for simplicity; there are actually 4 CTE's, and the tables within have a LOT of columns

danielems · July 28, 2015, 5:24pm

Here is some test code. The results (in this one simple case using SQL 2012) are:

SQL server handles a new column in a view's underlying table correctly when select * is used in the CTE if the columns in the view are explicitly named
SQL server does not handle the new column correctly when select * is used in the CTE and in the view. The correct rows are returned, but the columns are messed up.

I hope I'm making sense.


IF OBJECT_ID('dbo.test', 'U') IS NOT NULL
    DROP TABLE dbo.test; 
IF OBJECT_ID('dbo.vtest','V') IS NOT NULL

DROP VIEW dbo.vtest;
IF OBJECT_ID('dbo.vtest_star','V') IS NOT NULL

DROP VIEW dbo.vtest_star;
CREATE TABLE test (

old_id      int,

old_name    varchar(100),

old_address varchar(100),

old_phone   varchar(15)

)
INSERT INTO test(old_id, old_name, old_address, old_phone) VALUES

(1, 'Name 1', 'Address 1', '502-555-1111'),

(2, 'Name 2', 'Address 2', '502-555-2222'),

(3, 'Name 3', 'Address 3', '812-555-3333'),

(4, 'Name 4', 'Address 4', '812-555-4444'),

(5, 'Name 5', 'Address 5', '812-555-5555')

GO
-- create a view with a * in the CTE and explicit fields in the query

CREATE VIEW vtest AS

WITH t AS (SELECT * FROM TEST WHERE old_phone LIKE '812%')

SELECT old_id, old_name, old_address, old_phone FROM t

GO
-- create a view with a * in the CTE and the query

CREATE VIEW vtest_star AS

WITH t AS (SELECT * FROM TEST WHERE old_phone LIKE '812%')

SELECT * FROM t

GO
-- this correctly returns all records where the phone number starts with 812

SELECT * FROM vtest
-- this correctly returns all records where the phone number starts with 812

SELECT * FROM vtest_star
-- re-create the test table with a new field in the second position "new_type"

IF OBJECT_ID('dbo.test', 'U') IS NOT NULL

DROP TABLE dbo.test;
CREATE TABLE test (

old_id      int,

new_type    varchar(20), -- new column

old_name    varchar(100),

old_address varchar(100),

old_phone   varchar(15)

)

GO
INSERT INTO test(old_id, new_type, old_name, old_address, old_phone) VALUES

(1, 'Type A', 'Name 1', 'Address 1', '502-555-1111'),

(2, 'Type B', 'Name 2', 'Address 2', '502-555-2222'),

(3, 'Type B', 'Name 3', 'Address 3', '812-555-3333'),

(4, 'Type B', 'Name 4', 'Address 4', '812-555-4444'),

(5, 'Type A', 'Name 5', 'Address 5', '812-555-5555')

GO
-- this correctly returns all records where the phone number starts with 812

SELECT * FROM vtest
-- this returns the correct records, but the data are in the wrong fields

SELECT * FROM vtest_star

ScottPletcher · July 28, 2015, 7:34pm

I wasn't meaning to disagree with gbritton's answer, but I was also trying to address the specific q asked:

it's bad practice to use select * within a view, because the view doesn't update if new fields are added to the underlying table. ...
Using the code below as an example, will I have the same problem if a new field is added to the HCRIS_HHAAbstract table in the CTE?

Yes, because the views will not include the new columns until they are refreshed. The views might still function, but they won't return new columns, even if you used SELECT * in both the cte and the main query.

Other disadvantages of using SELECT *:
*) it will fail if you drop a column that existed when the view was created
*) if you ALTER a column the view still has the old definition; while it's not likely to ever happen, I suspect it would be possible to ALTER a column multiple times such that the datatype in the view will fail when SQL tries to read the current data type from the table

Kristen · July 28, 2015, 8:07pm

Complicated is not the word I would use, although it will be more verbose. A list of columns makes it explicit and thus more precise.

It is, potentially, more work to creating the column list ... but I never actually type a list of columns (if that was your concern?) plenty of ways to get that from SQL's Meta Data. For example, dragging the "Columns" branch of the table / column listing tree, into the Query window will provide a comma delimited list of columns.

A good and useful example, thanks

I also tried having the Full list in the CTE and SELECT * in the outer. That worked correctly (and would not bother me as a short-hand)

Running

EXEC sp_depends 'test'

before

DROP TABLE dbo.test;

correctly identifies the related VIEWs, but as soon as the table is dropped, even if it is then recreated, it no longer lists anything ... so using that route it would be necessary to use it before the DROP - just absolutely no use for "Oops, didn't mean to do that, I wonder what is related that now needs fixing"

Could probably do a

SELECT ... LIKE '%MyTableName%'

on the Meta Table containing the View's definition (and also and Sproc, Trigger etc.) which woudl work correctly AFTER the fact.

After the new revised table is created

EXEC sp_refreshview 'dbo.vtest'
EXEC sp_refreshview 'dbo.vtest_star'

sorts the issue out - but only any use provided that you know the names of the (related) VIEWs

I don't know how "expensive" it would be to use sp_refreshview on every view in the database, but it would not be hard to script for a "belt and braces" attack after changing a table.

Kristen · July 28, 2015, 8:24pm

This will generate the Refresh SProc calls - but it rather depends on the "Uniqueness" of the table name(**). I've restricted it to the name not being preceded/followed by Alpha-Numeric characters etc. but spaces and all sorts are used and fairly commonly used (we don't allow anything other than [A-Z0-9_] and no leading "_" here)

SELECT	'EXEC sp_refreshview ''' + S.name + '.' + O.name + ''''
FROM	sys.objects AS O
	join sys.schemas AS S
		 ON S.schema_id = O.schema_id
	join sys.sql_modules AS M
		 ON M.object_id = O.object_id
WHERE	    O.type      = 'V'
	AND M.definition LIKE '%[^A-Z0-9]' + 'MyTableName' + '[^A-Z0-9_]%'

It won't work if the Schema or View name contains a single quote ... fixing that is left as an exercise for the reader!

EDIT: (**) I suppose if this refreshes rather more Views that is absolutely necessary that won't matter much - so longer as it includes all the ones that do need doing that's good enough.

A refinement would to use the Modified Date column on the Table / Columns system tables to determine which Tables have been modified since a given date, and then search for any of those table names and generate a Refresh for those.

ScottPletcher · July 29, 2015, 5:32pm

MS recommends you instead use sys.dm_sql_referencing_entities and similar dmvs to identity such dependencies. For example:

SELECT *
FROM sys.dm_sql_referencing_entities ( 'dbo.view_name', 'OBJECT' ) --be sure to include the schema name

which will show you all objects in the db that reference that view, at least as far as SQL "knows". This view may not be absolutely perfect, but it's very good overall.

Kristen · July 29, 2015, 6:08pm

Didn't know about sys.dm_sql_referencing_entities, thanks

Pity MS didn't use

COALESCE(parsename(@name, 2), 'dbo')

in their definition of sys.dm_sql_referencing_entities

I had a look at the source for [sp_depends] - the comment still says "--- 1996/08/09 16:51" ...