Table with 789.6 million records taking 5 mins to execute the select stmt

anand · June 25, 2015, 9:06pm

Thanks all for your help. Here is my query(ofcourse table names have been changed) which I'm trying to tune and currently taking 4-5 mins to finish . Also, you can see the tables details and other info below with the attachments on exe plan, etc.. .

Database1.Schema1.Object5:

Total Records : 789.6 million

of records between 01/01/2014 and 01/31/2014 : 28.2 million

My table has around 789 million records and it is partitioned on "Column19" by month and year .
Clustered index on Column19

Database1.Schema1.Object6:

Total Records : 24791

Database1.Schema1.Object7:

Total Records : 311

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 2 ms.

Table 'Database1.Schema1.Object6'. Scan count 9, logical reads 1082, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Database1.Schema1.Object7'. Scan count 4, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Database1.Schema1.Object5'. Scan count 9, logical reads 280072, physical reads 283, read-ahead reads 130274, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(17064 row(s) affected)

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 697975 ms, elapsed time = 254160 ms.

Query:

INSERT INTO Object1
(
Column3,
Column4,
Column5,
Column6,
Column7,
Column8,
Column9,
Column10,
Column11,
Column12,
Column13,
Column14,
Column15,
Column16,
Column17,
Column18
)
SELECT
Column3,
Column4,
Column5,
Isnull(right(Column4,9), Isnull),
Isnull(right(Column4,9), Isnull),
Object2.Column8,
Column9= Object2.Column8,
Column10 = Object3.Column19,
Column11 =
Sum (
CASE
WHEN Object3.Column8 IS NOT NULL
THEN Object3.Column20
ELSE 0
END
),
Column12 =
Sum (
CASE
WHEN Object3.Column8 IS NOT NULL AND Datediff(Column21,Object2.Column22,Object3.Column19) < 365
THEN Object3.Column20
ELSE 0
END
),
Column13 =
Sum (
CASE
WHEN Object3.Column23 = ?
THEN Object3.Column20
ELSE 0
END
),
NULL,
Column15 =
Sum (
CASE
WHEN Object3.Column8 IS NOT NULL AND Datediff(Column21,Object2.Column22,Object3.Column19) < 365
THEN Object3.Column20
ELSE 0
END
),
Column16 =
Sum (
CASE
WHEN Object4.Column24 IN ('abc', 'xyz', 'lmn' )
THEN Object3.Column20
ELSE 0
END
),
Column17 = 0,
Object2.Column18
FROM
Database1.Schema1.Object5 Object3

INNER JOIN Database1.Schema1.Object6 Object2
ON Object2.Column25 = Object3.Column25
AND Object2.Column26 = Object3.Column26
AND Object2.Column27 = Object3.Column27
AND Object2.Column28 = 'Y'

INNER JOIN Database1.Schema1.Object7 Object4
ON Object4.Column29 = Object3.Column29

WHERE
Object3.Column19 BETWEEN '01/01/2014' AND '01/31/2014'
GROUP BY
Object3.Column30,
Object3.Column31,
Object3.Column32,
Object3.Column25,
Object3.Column26,
Object3.Column19,
Object2.Column33,
Object2.Column22,
Object2.Column8
,Object3.Column4
,Object3.Column3
,Object3.Column5
,Object2.Column18

JeffModen · June 26, 2015, 12:53am

Heh... you've discovered a simple method of code obfuscation and job security.

As you posted, the partition represented by "Column19" contains 28.2 million rows. Let me ask you... if it were a single table instead of a partition of a larger table, what indexes would you have in play to join to the table? For example, looking at the join that uses Column29, do you have an index on Column29?

Also, when working with such a relatively large number of rows, it's very likely that some "pre-aggregation" of the larger tables into a Temp table and then joining on that Temp table instead of the larger table would serve you very well. Still, you are talking about essentially doing a full scan of a 28.2 million row table and unless there is some extraordinary properly indexed criteria available, it's going to take some time with a little "Divide'n'Conquer" and "pre-aggeegation", this could do quite well.

I also suspect that the following code should actually be in the WHERE clause..

AND Object2.Column28 = 'Y'

Of course, I might as well be talking out of my hat because you've not provided the "Actual Execution Plan" as an .sqlplan attachment, I know virtually nothing about your tables or indexes, and I don't even know the datatype of your partitioning column nor whether or not the related clustered index has been made unique by inclusion of the PK or what. I'll also tell you that GROUP BY is capable of hiding a wealth of sins in the form of accidental many-to-many joins but, again, no one on this side of the conversation can tell because there's nothing we can see that would be a reasonable indication. Heh... for that matter, it may be that everything is perfect and this is the best that can be done (although I seriously doubt that) but, again, can't tell from here.

To help you with this and future queries, I suggest downloading Grant Fritchey's ebook on how to use and interpret Execution Plans. There's also a wealth of other free book downloads at the following URL and some of them (like Grant's book) are written by best-of-class authors. Here's the link...
http://www.sqlservercentral.com/Books/

anand · June 26, 2015, 3:56am

The table Database1.Schema1.Object5 has a total of 8 indexes and when I see the execution plan only the clustered index seek is playing the role to pull the data.

Yes there is an index on Colunm29 but as I said above only clustered index seek is happening . No other index is being used by the optimizer.

Data type of the partition column is datetime and there is no primary key on the table .

Thanks for your reply again,!!!

JeffModen · June 26, 2015, 1:07pm

Yep. That would explain why that part only takes 25% of the cost. You posted the execution plan graphic over on SQLServerCentral.com. 75% of the execution plan cost (which is still an estimate even on the actual) is buried in the INSERT to the final table. Do you have a lot of indexes/constraints (particularly FKs) on that table, as well?

As for not having a PK on the table, I don't believe that will hurt anything here and it is possible that it would never come into play or make any real difference but I find it to be a bit odd. I just don't know enough about your data to say for sure.

anand · June 26, 2015, 2:45pm

I'm inserting the data into the temp table and how does the index matter on the table to which the data is loading ?