SQLTeam.com | Weblogs | Forums

Autonomy / iManage SQL Performance

sql2012

#1

I'm trying to improve the performance of batch processes (which basically "adjust" hierarchy records in a folder structure, RBAR). The batch process does lots (tens of thousands) of updates like this to adjust the hierarchy after each folder change. (I'm calling it "folder", as that's what the users see, but I guess Autonomy speak is Project given the table name)

DECLARE	@P1 datetime = '20150826 20:06:58'
	, @P2 int = 1
	, @P3 int = 3469

UPDATE	MHGROUP.PROJECTS 
SET	LEFT_VISIT = LEFT_VISIT - 2 
	, EDITWHEN = @P1  
WHERE	    TREE_ID = @P2 
	AND LEFT_VISIT > @P3

The PKey (which is non-clustered) on MHGROUP.PROJECTS is [PRJ_ID]

The clustered index key is TREE_ID, LEFT_VISIT, RIGHT_VISIT

There are a total of 13 indexes on the table ... so having these batch updates modify [LEFT_VISIT], and because that is a key in the clustered index, means that all the associated index records have to be updated. On each of the tends-of-thousands of single single-row updates like this one.

If I just change the Clustered Index and make it Non-Clustered, and make something else (.e.g the primary key) the Clustered index is there any consequence (to the APP)? or is it just the time to make the clustered index change and then see whether it improves anything?

Or am I barking up the wrong tree?


Order of Index creation
#2

I overlooked something ...

That update, above, updated 27,429 rows. Maybe, on the face of it, it IS helpful that the clustered index keys start with TREE_ID, LEFT_VISIT which covers the WHERE clause.

However, the fact that the Clustered Index key columns data value change requires up to 13 index changes makes me wonder what the trade-off would be if I changed this?

A non-clustered index on TREE_ID, LEFT_VISIT, RIGHT_VISIT (and a clustered index on something short & unique) would mean that the 27,429 rows to be updated would be scattered across the disk, but they would require no updates to the 13 indexes (which are definitely "scattered about").

Also, with the clustered index on TREE_ID, LEFT_VISIT, RIGHT_VISIT then if I update

SET LEFT_VISIT = LEFT_VISIT -2

might that not require that the record is physically moved (i.e. if there is a record with the same TREE_ID but with a LEFT_VISIT one less than the current row?)

I suppose all 27,429 rows being changed are currently tightly clustered, and the update does not change the length of the row so could be done in-situ, so perhaps it is just a judgement-call for whether having no secondary index updates, but the main record update being totally random across the disk, is going to be an improvement.

(wrt Performance the update took 7.7s, CPU=7,468ms, Reads=2,807,557, Writes=870).

Test Server is currently on an earlier version of SQL, so I can't restore the DB back to experiment, but I could probably find a "similar sized" recordset in the (stale) DB version currently on the Test Server and do a comparison of Reads/Writes with the current clustered index and then a different one of my choice.


#3

Do you really need all 13 nonclus indexes? Can some of those be combined at least? What are the usage stats on those indexes? What missing indexes does SQL "say" there are?

The best clustered index is the single most important for table performance. Thus, I wouldn't just willy-nilly change it.


#4

I very much doubt it ... however, I'm not sure if it would muck up the APP if I removed them. They look to be, mostly, for user-defined fields; I expect we have some of those, but I doubt we are using them all (I haven't had time to check yet). I suppose I could ask the vendor if its OK to drop ones on columns are we are not using.

There are some plain duplicate indexes there. I have no idea how they came about ...

IX_AAA
IX_BBB
etc. with the "IX_xxxx" naming convention
PK_PROJECTS (unique, primary key) : PRJ_ID
UX_PROJCTS_PRJ_ID (unique) : PRJ_ID

I'm guessing, from the name, that "UX_xxx" is a user-specific or user-specified index. How we managed to have one created which exactly matches another index is a surprise. I was planning to look at the CREATE DATE on that index to see if that sheds any light on why we might have it - or at least WHO to ask WHY we have it!

No point me adding a FILTER to the indexes (e.g. WHERE TheColumn IS NOT NULL), presumably?, because (as I understand it) unless that same filter is included in the query then SQL won't consider using that index - and I doubt that the APP does

WHERE     UserDefinedColumn ='AAA'
      AND UserDefinedColumn IS NOT NULL    -- which would match my filter

Seems like bit of an either-or there?!! I need to change it in order to find out if performance improves :slight_smile: but I agree, best not to do it willy-nilly. Bit frustrated that the Test Server is currently on previous-version (of both APP and SQL), but I suspect any test I run on that is likely to be formative in helping me decide.

By the by, and I suspect this is fairly common, AND we've done it often in the past, but the situation I'm in never occurred to me as a risk before:

We wanted to upgrade a server
We decided to buy a new, faster, one
We set up the new server (in isolation) and tested it, with both the latest APP upgrade and a new version of SQL Server
We "Went Live" on the new server, re-migrating the data from the old server in a "closed period", in the knowledge that we had the old server on hand in case we needed to back out.
We planed to wait "a while" before upgrading the old server to become the new Test Platform "Just in case" we found that we had failed to migrate something rarely used on that server.

And now we find ourselves with NO test platform running the New Version - just at the moment in time when we have teething problems after going live with the new software.


#5

So do the existing missing index stats and index usage stats indicate a different clustered index is, or may be, needed? That's almost always the best indicator, and the minimum I would review before choosing a clus index. Certain other things often also need to be considered, but that is the minimum data needed to analyze clustering requirements.