Recently I was asked to investigate the update statistics process on a particular database on the SQL Sever 2019 box. It is taking longer and longer and interfering with other jobs that occur at a later stage.
I investigate what tables are getting updated and there are 3 really big table (and their audits, giving a total of 6) that are very big. There are then lots of other smaller tables (which don't really take long)
Basically I am going to isolate the 6 really big tables from the other smaller tables. The smaller tables weekly update statistics process runs fine (even "with fullscan") - it does not take long.
I want to develop a separate update statistics process for the big 6 tables with a schedule of perhaps late Friday so it can run over (or perhaps the Fridays of bank holiday weekends perhaps as it gives an extra day if the process takes a long time).
I have some facts & figures on them below. They all have "with fullscan" on them but 1 of them (the biggest) has a sample of 50% and also has some sort of pre-existing "alter index" command there...
I would like to do something more intelligent on these tables as regards the sampling rate and hence am looking for any advice that you may be able to offer. The tables are increasing in size over time and hence the sampling rate may have to be dynamically calculated. I would also like to record the times.
Tall order I know but would be most appreciative for any advice/guidance/suggestions/code samples that you may be able to offer. Please note I'm not a DBA just a developer but with a pretty decent level of SQL...
Many thanks for any assistance/guidance…
| TableName |
TableSizeInKB |
IndexSizeInKB |
TableSizeIn+IndexSizeIn |
| TBL_A_Audit |
47905416 |
77448 |
47982864.000000GB |
| TBL_A |
20029224 |
86472 |
20115696.000000GB |
| TBL_B_Audit |
27379304 |
45760 |
27425063.999488GB |
| TBL_B |
1425880 |
1069848 |
2495728.000000GB |
| TBL_C_Audit |
7351992 |
12816 |
7364807.999488GB |
| TBL_C |
1601560 |
185424 |
1786983.999488GB |
Pre-existing “tbl_A_audit” commands
ALTER INDEX [PK_TBL_A_Audit] ON [dbo].[TBL_A_Audit] REBUILD PARTITION = ALL WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
UPDATE STATISTICS [dbo].[TBL_A_Audit]
WITH SAMPLE 50 PERCENT
An index rebuild will take a (very) long time (TBL_A) and will already update statistics equivalent to a FULLSCAN. But you shouldn’t need to rebuild the entire table. Isn’t an audit table (almost) totally static once it’s inserted? If it is, partition the table, and only rebuild the current, still-being-inserted-to partition, if you really need a rebuild (depends on the clustering index). The static partition(s) could be left untouched.
It would be very helpful to have details on the indexes on the table, including row counts. If the table is huge because of row counts that’s somewhat different tuning than if it’s huge because of LOB column(s).
I have added the number of columns and rows count for the 1st 2 tables. I also include index information on the first 2 tables.
| TableName |
TableSizeInKB |
IndexSizeInKB |
TableSizeIn+IndexSizeIn |
No Table Columns |
No rows |
| TBL_A_Audit |
47905416 |
77448 |
47982864.000000GB |
513 |
11976410 |
| TBL_A |
20029224 |
86472 |
20115696.000000GB |
501 |
5112926 |
| TBL_B_Audit |
27379304 |
45760 |
27425063.999488GB |
|
|
| TBL_B |
1425880 |
1069848 |
2495728.000000GB |
|
|
| TBL_C_Audit |
7351992 |
12816 |
7364807.999488GB |
|
|
| TBL_C |
1601560 |
185424 |
1786983.999488GB |
|
|
| object_id |
name |
index_id |
type |
type_desc |
is_unique |
data_space_id |
ignore_dup_key |
is_primary_key |
is_unique_constraint |
fill_factor |
is_padded |
is_disabled |
is_hypothetical |
is_ignored_in_optimization |
allow_row_locks |
allow_page_locks |
has_filter |
filter_definition |
compression_delay |
suppress_dup_key_messages |
auto_created |
optimize_for_sequential_key |
| 1406653546 |
PK_TBL_A_Audit |
1 |
1 |
CLUSTERED |
1 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
NULL |
NULL |
0 |
0 |
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1470653774 |
PK_TBL_A |
1 |
1 |
CLUSTERED |
1 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
NULL |
NULL |
0 |
0 |
0 |
| 1470653774 |
TBL_A_Extra_INDEX |
2 |
2 |
NONCLUSTERED |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
NULL |
NULL |
0 |
0 |
0 |
The audit tables gets written to upon insert, update or delete action on the associated main table. These tabels can be written to anytome during work hours but but have a spike of input for about 7 days before the 20th of each month (ie from about the 13th to the 20th). On the 20th we run a calculation/process and require all resources. This takes several hours. Then we have a quiet period after the 20th (with most users accessing reports, which query ther main tables) and not much input but this picks up again gradually for the 20th of the next month. It's cyclic basically...
We update statistics manually every Friday evening - but as mentioned previously this is starting to interfere with processes on Monday morning
Let me know if you need more info but I'm thinking of segregating these "big-6" tables into their own update-statistics process and refining the sample rate. Hope full this will improve things.
For large, heavily-used tables, the key is to stop treating statistics updates as an all-or-nothing full-scan job and instead make the process more targeted and data-change aware. SQL Server’s automatic statistics updates are actually reliable when data modifications are moderate, so let them handle the lighter workloads. For the big tables that see frequent inserts or updates, run manual stats updates — but only for the tables or columns that actually changed, and avoid always using FULLSCAN since that’s usually overkill and extremely expensive. If you’re already rebuilding indexes, remember that an index rebuild automatically refreshes index statistics, so running a separate full stats update right afterwards just wastes time and I/O. The most efficient strategy is to monitor stats freshness using DMVs or STATS_DATE, use sample-based updates where appropriate, and combine auto-stats with selective manual updates. This gives you nearly all the optimizer benefits without the heavy overhead of blanket full-scan updates on very large tables.