SQLTeam.com | Weblogs | Forums

Large Table Select Performance

sql2012

#1

I inherited a process that has suddenly started to fail.
There is table that has 7126226 rows in it and the process is trying to find the min(createdate) for each rowid where a certain criteria is met.

There is a non-clustered index on the rowid and the table is not in rowid order.

This query is run 5 different times against the MDM_C_PARTY_HIST table matching to different tables and fields each time.

Here is one of the queries - in this case the PH.ROWID_OBJECT and the BU.ROWID_OBJECT have indexes but the PH.ROWID_BU does not.

SELECT
MIN(PH.HIST_CREATE_DATE)
FROM
dbo.MDM_C_PARTY_HIST AS PH
JOIN dbo.MDM_C_LKP_BU AS BU ON PH.ROWID_BU = BU.ROWID_OBJECT
WHERE
BU.BUS_UNIT_NAME <> 'PENDING'
GROUP BY
PH.ROWID_OBJECT

I have left this query run for 25 min and gotten no results. When I look at the query plan it says it\ estimates 580993000 rows.

Here is the question:
I am not sure the best approach for getting this to work.

Do I make the PH.ROWID_OBJECT index clustered instead of nonclustered?
Do I add an index for each field that the the BU.ROWID_OBJECT is being joined to on the PH table?

I am worried that adding more indexes will affect the load process. This table already takes more than an hour to load. If I add 5 more indexes, or add a clustered index won't that just show it down even more?

Is there a better approach to take ? I am at a total loss for how to address this.

Thanks


#2

as a starter without yet creating indices
which of these 2 tables has the least rows now
MDM_C_PARTY_HIST or MDM_C_LKP_BU


#3

Tables without a clustered index (aka heaps) generally don't perform well in SQL Server, particularly if the data in them changes and you end up with large numbers of forwarded records. Adding an appropriate CI is often a good solution particularly if there is something which meets the usual criteria of "small, incrementing and unique" as that will usually be sufficient to ensure performance doesn't take too much of a hit during inserts of new data.

If not, rebuilding the table can help because doing so will remove all forwarded records that currently exist:

ALTER TABLE tablename REBUILD

If that does help though, it's probably a sign you need to monitor such heaps for forwarded record counts and factor the need to rebuild the table into your maintenance procedures.


#4

The most critical performance factor for any table is to first get the best clustered index in place on that table.

  1. Drop all nonclus indexes on table PH

  2. Create a clus index on PH on ( ROWID_BU ), with FILLFACTOR of 95 or more

  3. Recreate any other nonclus index(es) that you truly need on PH, but none that start with ROWID_BU since that's the clus index key

  4. Drop all nonclus indexes on table BU

  5. Create a clus index on PH on ( ROWID_OBJECT ), with FILLFACTOR of 95 or more

  6. Recreate any other nonclus index(es) that you truly need on BU, but none that start with ROWID_OBJECT since that's the clus index key


#5

[yosiasz] MDM_C_LKP_BU is smaller - it only has 57 rows
the other table has 580993000 rows


#6

Hi AndyC - thanks for responding - I dropped all the indexes on this table and recreated the non-clustered index about a week ago. As I understand it that would mean there isn't a lot of forwarding happening correct? Or does the forwarding not get addressed in this manner?


#7

No. Again, non-clustered index is only useful in very specific cases. For best performance, you must cluster the large table on how you query it most often. You could spend hours futzing about with non-clus indexes and still not really get any performance gain from it.


#8

ok thanks this is very helpful


#9

No, it wouldn't help with the forwarded records problem. Unlike a table with a clustered index (where all the data is effectively in an index) the table data in a heap is independent of the indexes. The only way to clean up the forwarded records is to rebuild the table.