Thanks Rob. My simple answer to the valid points you mention is ">8,000 will happen so rarely that I don't care about any performance hit" (so long as: the rest of the time, < 8,000 chars, there is NO performance hit compared to before)
Pondering this (with Mr Google's help ) has send me on a long-research-read from which it seems:
Any rows that fit IN ROW, including after any UPDATE, are likely to perform identically.
Convenience of VARCHAR(MAX) for things that MIGHT be larger is worth having. Performance penalty, given that only 2% of rows will be excess-sized, will be acceptable. (Although as Rob pointed out my 2% is > current 1,000 byte limit, it might be that none of them would actually exceed 8,000 bytes)
If the majority of rows were > 8,060 bytes it might be worth forcing the column to be LOB_DATA - an update of LOB_DATA is faster if it remains out-of-row, rather than shrinking to be accommodated IN_ROW. That's not the case for me.
Side issues:
varchar(max) in a table prevents an online index rebuild. We only ever REORGANISE large indexes so I presume that's not a problem for me.
Cannot move tables with LOB data to a different filegroup. Can't see that being an issue for me, certainly not a frequent one!! - if it arose we'd have to do a Create TempTable, copy data over, drop original, rename table name and all the associated Indexes, Constraints, etc etc.
VARCHAR(max) can't be part of an index - but nor can VARCHAR(900+1). Again not an issue for me. Similarly a SORT could force use of TEMPDB - that might be a consideration for not excessively sizing a column's width.
Data in LOB doesn't get compressed. Not sure that bothers me - its not a reason to truncate I'd-really-prefer-this-was-kept data.
Should I just use VARCHAR(MAX) all the time? Tempting but the consensus is that it encourages goofy-data. If a CODE column should be 10 characters max then don't give the user the opportunity to enter 11 or more We have generous sized email columns, because the RFC says they can be long ... so users put multiple email addresses in there (well, attempt to, we don't allow 2x "@" in the field)
However, if the data is genuinely variable in size - a "Notes" column for example - then I can't see a reason to set an arbitrary 1,000 character limit, as we have done in the past. Might as well make it VARCHAR(MAX) and if the user wants to store War & Peace in there let them worry about how to edit it given the letter-box sized form-field that we provide!
String manipulations on VARCHAR(MAX) datatype columns are slower Not a problem in my example, but might be a consideration for the "VARCHAR(MAX) everything" line of thought! if the columns are used in functions like CHARINDEX & PATINDEX, or in any UDFs. (Side note: I have separate UDFs to split VARCHAR(8000) and VARCHAR(MAX) because of the performance differences, I can continue to do that, if necessary - i.e. selectively calling One or Other depending on actual column size)
I also found some references to tables that are used by SSIS - in that it would allocate memory buffers for the size of column being imported. Don't know if that is still a consideration? (It is not one in my situation)
I also found some chatter that VARCHAR(MAX) columns should be grouped at the end of the table (something referencing something that they thought that Kimberly and/or Kalen may have said ...)
I haven't found anything about possible performance differences of @MyParameter VARCHAR(8000) compared to @MyParameter VARCHAR(MAX), although it seems that it depends on the size of the actual stored data - so, again, perhaps "no measurable difference" when the actual data stored is < 8,000 characters (which, for me, is 98% of the time)
I wonder if the total defined size of the table's columns (excluding VARCHAR(MAX) is > 8K whether it would be better to use VARCHAR(MAX). For example "Company names and Address lines MIGHT be long, I'll allocate VARCHAR(1000) to all of them". Is there then a risk that, collectively, the VARCHAR(n) columns are >8K and SQL then raises an error on INSERT - whereas with VARCHAR(MAX) columns it would just store them as LOB_DATA. That might be an edge-condition worth considering for a table with a lot of large VARCHAR(n) columns
Tangential thoughts:
I do have SProcs that do this
CREATE PROCEDURE MyProc
@SearchString varchar(40)
AS
SELECT @SearchString = '%' + NullIf(RTrim(@SearchString), '') + '%'
SELECT Col1, Col2, ...
FROM Mytable
WHERE @SearchString IS NULL
OR MyVarchar_40_Column LIKE @SearchString
of course there is an edge condition here where the user has provided a search string EXACTLY 40 character long (actually 39 Chars would do). @SearchString is not big enough to accommodate the pair of '%' wildcards. So I have to remember to DECLARE them as 42, rather than 40, and they are no longer an exact match of the column's defined width - so code-checking becomes less straightforward. @SearchString varchar(MAX) would fix that.
It begs the question that IF @StringParameter varchar(MAX) has no performance penalty when actually used for small strings whether that isn't preferable. Saves me having to remember to DECLARE @SearchString as "Maximum length of underlying table's column PLUS TWO" and also handy where a user could provide excessively long data (e.g. APP error failing to enforce form-field length), or a wayward data import. SProc could then check that length of all parameters was <= Max permitted length instead of SQL Engine begin responsible for that.
Currently all our SPRoc parameters are same-size-as-table's-column (with the plus-two exception above). If we increase the size of a table's column we have an avalanche of SProc changes that we have to make. Missing just one will (silently ) truncate data which is REALLY bad ... on the plus side we DO have to visit every place in the code where the column is used, so we do actually get the chance to think about whether the increase in length impacts something. Although, arguing black-is-white for a moment!, if the target column is, in practical terms, a 10-char column but is actually declared as VARCHAR(MAX) it will fail-safe it we accidentally store the new column's 12-char value into it.
It also causes me to wonder if all VARCHAR table columns should be created as VARCHAR(MAX) and their actual max length enforced within the APP. Saves any issue with increasing the column width in the future - just change some meta data within the APP and carry on ... no APP changes, no Database DDL changes.
Discuss!!!!
We've had clients with stupidly long Name & Address columns in their associated DBs (how do they ever print an address label for an envelop? and yet they have EMail address columns defined oo small to fit the Max Possible RFC definied length ). If we had VARCHAR(MAX) for all Name and Address etc. columns we could adjust Max Validated Size in our Metadata on a client-by-client basis.
I suppose it all comes down to me THINKING there was a performance penalty of VARCHAR(MAX), particularly for Small Data, and now finding that there does not appear to be one ...