Dear Experts,
I have a table related to call center call data with around 400 million rows and growing. It has data like phone number, IVR menu codes, call start, call end time and a unique call id for each call.
There is another process in the application which executes a query that performs count of the calls by phone number for different periods(Month, Quarter and Year) end date is current date.
And the most bad part of the query is the filter on phone number using a leading wild card. I explained the performance issues of this, but they don't want to change it, as that is how the parameter values are being searched from the application they said.
The table has necessary indexes.
CPU is always between 80 to 90% when this query runs.
I am thinking of implementing a staging table which would hold the call counts for the required periods for every phone number and then the application can query the staging table by just passing the phone number.
I would really appreciate your advice on this or any other method which can help improve the process.
That's a good idea, and you may be able to simplify keeping it updated with an indexed view:
There are a number of limitations with those (for instance, need to use COUNT_BIG) but if it works it's probably the best solution. If your 400 million call records group down to less than 10 million distinct phone numbers, then you'll save a lot of I/O. Plus you won't have the additional call detail, just phone and summary data.
Exactly what searches are being done on phone#? Most likely you should store the number in separate parts (zip code(3), prefix(3) and line number(4)). Naturally you can still display them as a single value. Then you could search -- and index -- any part of the number you needed to.
@ScottPletcher Zip, Prefix, line number all of them are stored as single value in the column. @Ifor's solution worked with great improvement in the performance.
Just for information purposes... 10 digit phone numbers that follow the NANPA (North American Numbering Plan) have 3 parts like you say but they not called what you called them...
Digits
1 to 3 NPA (a.k.a. Area Code) and the N and P and A are codes for what the digits can be.
4 to 5 NXX (a.k.a. Exchange or Prefix) and, again, those letters are code for what the digits can be.
6 to 10 The line number.
I don't know how it is now but Mexico was slightly different and I don't know what they do now.
U.S. territories also follow the NANPA.
Other countries are quite different but there IS an ISO document for what they all have (including North America and its territories) and that is the 1 to 3 digit Country Code.
As a bit of a sidebar, I cut my teeth on SQL Server working with CDRs (Call Detail Records) and a whole bunch more having to do with telephony and worked with a fantastic source of city data for phone numbers across the world coming from the company that I won't name here because I don't want folks to think I'm spamming. I also have nothing else to do with that company and get nothing for mentioning their name.