SQLTeam.com | Weblogs | Forums

Execution is really slow pulling large records


#1

Hello SQl Expert,

I am trying to run a table that has 18 million records and it is really slow. How do I improve this rendering?

Here is my select statement:

SELECT TaxPayerID, TaxPayerName, TaxPayerLast4SS, TaxPayerDOB
FROM TAXTBL

There is no index. It took me about 30 min to get all of the 18 million records in this simple query. Anyone can help me to speed this execution?


#2

There are lots of things that can be done to make it faster. If I assume you need to do these UNIONs, then you can speed up the query by :

1. Caching the results, for example,
Can you create an indexed view from the whole statement ? Or there are lots of different WHERE conditions, so there'd be lots of indexed views ? But know that this will slow down modifications (INSERT, etc.) for those tables.
Can you cache it in a different way ? Maybe in the mid layer ?
Can it be recalculated in advance ?

2. Make a covering index. Leading columns are columns form WHERE and then all other columns from the query as included columns

Note that a covering index can be also filtered but filtered index isn't used if the WHERE in the query will have variables / parameters and they can potentially have the value that is not covered by the filtered index (i.e., the row isn't covered)

3. ORDER BY will cause sort

If you can cache it, then it's fine - no sort will be needed (it's cached sorted).

Otherwise, sort is CPU bound (and I/O bound if not in memory). To speed it up, do you use fast collation ? The performance difference between the slowest and fastest collation can be even 3 times. For example, SQL_EBCDIC280_CP1_CS_AS, SQL_Latin1_General_CP1251_CS_AS, SQL_Latin1_General_CP1_CI_AS are one of the fastest collations. However, it's hard to make recommendations if I don't know the collation characteristics you need.

4. Network
'network packet size' for the connection that does the SELECT should be the maximum value possible - 32,767 bytes if the result set (number of rows) will be big. This can be set on the client side, e.g., if you use .NET and SqlConnection in the connection string. This will minimize CPU overhead when sending data from the SQL Server and will improve performance on both side - client and server. This can boost performance even by tens of percents if the network was the bottleneck.
Use shared memory endpoint if the client is on the SQL Server; otherwise TCP/IP for the best performance.

5. General things

As you said, using isolation level read uncommmitted will improve the performance
...

Probably you can't do changes beyond rewriting the query, etc. but just in case, adding more memory in case it isn't sufficient now, or using SQL Server 2014 in memory features :-), ... would surely help.

There are way too many things that could be tuned but it's hard to point out the key ones if the question isn't very specific.

Hope this helps a bit


#3

@jason_clark : I wonder if your reply was intended for a different thread? Can't see that the O/P has edited their post, and I'm either mis-reading something here or being a mega prat (it happens!)

Might be the network time to transfer all that data, or the CPU time on the Client to format it for display / convert it into Excel format, whatever.

If you can connect (e.g. remote desktop connect) direct to the server and run the query outputting to a temporay file, on a local C: drive and check the time for that it will tell you the difference between SQL time and Network / Display time.

18 million rows is still a lot of data, however you only have 4 columns and they look quite "skinny". However, if any of the columns are actually wide (is TaxPayerName 30-ish characters? or is it perhaps 8,000 characters? or more?!!) then that would take longer to process etc.

Can;t see that that matters. You are selected all records, with no WHERE clause, and no ORDER BY. SQL will just traverse the whole table and grab all records. It IS possible that, absent a clustered index, the table is massive fragmented which might adversely impact the amount of effort to get the data, but I can't see that being a significant part of 30 minutes (it does point to bad database design and housekeeping maintenance though).


#4

Might be worth trying this, and seeing how long it takes:

SELECT COUNT(*)
FROM
(
    SELECT TaxPayerID, TaxPayerName, TaxPayerLast4SS, TaxPayerDOB
    FROM TAXTBL
) AS X

as SQL will have to fetch all the rows from the table but not have to transfer them across the network / format them on the client. If that only takes a few seconds then the problem is either Network or Client formatting.


#5

Thanks Kristen


#6

What do you actually want to do with that 18 million row result?