Running Total very slow

So I have a script that is part of a nightly SSIS job. The query takes about 1 hour to run. The DBA wants me to rewrite what I have but I haven't had any luck. I have 8 company's data or 8 tables that I am querying and then I UNION ALL. The culprit in all of this is a column that my A/R Department wants. Basically all GL Ledger entries for current year, then running total of these, and then finding the MAX value out of this running total list. Here is the script for this particular column:

SELECT [Customer No_], MAX(RunningTotal)HighCredit
FROM(
SELECT DISTINCT a.[customer no_], a.[Posting Date], a.[Entry No_], a.Amount, SUM(b.Amount) AS RunningTotal
FROM [myGLtable] AS a INNER JOIN
[myGLtable] AS b ON a.[Entry No_] >= b.[Entry No_] and a.[customer no_]=b.[customer no_]
WHERE YEAR(a.[posting date])=YEAR(GETDATE())
GROUP BY a.[customer no_], a.[Posting Date], a.[Entry No_], a.Amount
) AS X
GROUP BY [Customer No_]

I have tried other ways to write this. I can't get the cursor way to work. SQL Server 2008 so I can't use the new SUM OVER or whatever it is. Not sure if there is a better way to be doing this. If anyone has any clue on how I can optimize what I have or do something with temp tables???? not sure as I am somewhat new to the issues with running totals and how long they take to run.

are your tables properly indexed?you may want to look at the indexes on the columns in where condition and join.

try cte
;with Cte_runningTotal
as
(
SELECT DISTINCT a.[customer no_], a.[Posting Date], a.[Entry No_], a.Amount, SUM(b.Amount) AS RunningTotal
FROM [myGLtable] AS a INNER JOIN
[myGLtable] AS b ON a.[Entry No_] >= b.[Entry No_] and a.[customer no_]=b.[customer no_]
WHERE YEAR(a.[posting date])=YEAR(GETDATE())
GROUP BY a.[customer no_], a.[Posting Date], a.[Entry No_], a.Amount
)
SELECT [Customer No_], MAX(RunningTotal)HighCredit
FROM Cte_runningTotal
GROUP BY [Customer No_]

Can you please tell me how much time taken your query and index details

Change your where clause so you don't have a function on the posting date column. After you do that, if you have an index on the posting date column, it should speed up the query.

SELECT  [Customer No_] ,
MAX(RunningTotal) HighCredit
FROM    ( SELECT 
            a.[customer no_] ,
            a.[Posting Date] ,
            a.[Entry No_] ,
            a.Amount ,
            SUM(b.Amount) AS RunningTotal
  FROM      [myGLtable] AS a
            INNER JOIN [myGLtable] AS b ON a.[Entry No_] >= b.[Entry No_]
                                AND a.[customer no_] = b.[customer no_]
  WHERE     a.[posting date] >= DATEADD(YEAR, DATEDIFF(YEAR,0,GETDATE()),0)
				AND .[posting date] < DATEADD(YEAR, DATEDIFF(YEAR,0,GETDATE())+1,0)
  GROUP BY  a.[customer no_] ,
            a.[Posting Date] ,
            a.[Entry No_] ,
            a.Amount
) AS X
GROUP BY [Customer No_]

so if I take out the WHERE clause then how will I get the running total for current year? I don't need the running total for all entries dating back 5 or 6 years. this would give me an incorrect number or value.

"all GL Ledger entries for current year, then running total of these, and then finding the MAX value out of this....

this is an accounting system that we have no personnel that works on. we have consultants every now and then that will look into things. there is no indexing that I see on this table.

I tried the CTE script and it runs several seconds slower. so I have 8 individuals companies that I query from or have set up Views for where the query resides. the smaller companies have less records and it will typically run in 20 seconds. the largest company which has the most records returned has the query run in 33-34 minutes. then I have to combine or UNION ALL of these 8 company's data together for one big data extract.

I guess one thought would be to have several extracts instead of one and then just dump them all together into the same table? not the exact fix I would want as I think the indexing and a rewrite of the query might be the answer. I tried the cursor method and it runs but the value is incorrect. it seemed to run in about 1/3 less time than the others.

I am not suggesting that you take out the WHERE clause. I am suggesting that you replace the current where clause which is

WHERE YEAR(a.[posting date])=YEAR(GETDATE())

with what I posted earlier which is

WHERE   a.[posting date] >= DATEADD(YEAR, DATEDIFF(YEAR,0,GETDATE()),0)
	AND .[posting date] < DATEADD(YEAR, DATEDIFF(YEAR,0,GETDATE())+1,0)

Logically both are the same, and both will filter for rows for which posting date is in the current year.

thanks James. I tried this and it made no difference. yes, indexes need to be created but this might be an issue as we don't have direct control over this system. per one of my earlier replies.

Are there any indexes on the table at all? Is there a primary key? What is the primary key if there is one?

If there are no indexes and if you are not able to create one, I don't see anything that you can do to speed up the query.

I am inferring that the query works correctly even though it is slow. Usually when people refer to running total, what they refer to is cumulative total starting from some point in time - in your case at the beginning of the year. However, your query is calculating the sum of the amount for each posting date and picking the largest one out of that for a given customer and entry number. That does not seem to fit the usual definition of running total from what I know.

Regardless, if your query is logically producing the correct results, that is a non-issue.

IF OBJECT_ID('tempdb.dbo.#gl_entries') IS NOT NULL

DROP TABLE #gl_entries
CREATE TABLE #gl_entries (
[Customer No_] varchar(10) NULL,
[Posting Date] date NULL,
[Entry No_] int NULL,
Amount money,
UNIQUE CLUSTERED ( [Customer No_], [Posting Date], [Entry No_] )
)

INSERT INTO #gl_entries ( [Customer No_], [Posting Date], [Entry No_], Amount )
SELECT gl.[customer no_], gl.[Posting Date], gl.[Entry No_], gl.Amount
FROM [myGLtable] gl
WHERE gl.[posting date] >= DATEADD(YEAR, DATEDIFF(YEAR, 0, GETDATE()), 0) AND

gl.[posting date] < DATEADD(YEAR, DATEDIFF(YEAR, 0, GETDATE()) + 1, 0)

SELECT [Customer No_], MAX(RunningTotal)HighCredit
FROM (

SELECT [Customer No_], [Posting Date], [Entry No_],
(SELECT SUM(Amount)
FROM #gl_entries gl2
WHERE
gl2.[Customer No_] = gl.[Customer No_] AND
gl2.[Posting Date] <= gl.[Posting Date] AND
(gl2.[Posting Date] < gl.[Posting Date] OR gl2.[Entry No_] <= gl.[Entry No_])
) AS RunningTotal
FROM #gl_entries gl
) AS derived_table
GROUP BY [Customer No_]

scott,

this runs quicker than anything I have tried. I am not sure I am receiving the correct results however. I will check into this and reply in a minute or so. thanks for this though as it seems it could work.

Yes, and I just noticed that the original query didn't compare the [posting date] just the [Entry No_], but I compared both. The [posting date] check definitely needs removed from the subquery logic. And of course verify "<=" vs ">=". Hopefully that will correct the total!
...
(SELECT SUM(Amount)
FROM #gl_entries gl2
WHERE
gl2.[Customer No_] = gl.[Customer No_] AND
gl2.[Entry No_] <= gl.[Entry No_])
) AS RunningTotal
...

I fixed the <= to >= and getting close. not sure why but here is a customer that I filtered on and I wanted to show the first 5 entries for the query you provided and what I had originally.

Customer No_ Posting Date Entry No_ RunningTotal
271 1/20/2015 143843 -$75,320.13
271 1/20/2015 143844 -$88,680.13
271 1/20/2015 143845 -$101,450.13
271 1/21/2015 144012 -$124,782.63
271 1/21/2015 144013 -$109,111.73

customer no_ Posting Date Entry No_ Amount RunningTotal
271 1/20/2015 143843 $13,360.00 $88,680.13
271 1/20/2015 143844 $12,770.00 $101,450.13
271 1/20/2015 143845 $23,332.50 $124,782.63
271 1/21/2015 144012 -$15,670.90 $109,111.73
271 1/21/2015 144013 -$15,670.90 $93,440.83

if you notice the bottom results show the actual invoice amount or payment amount (payment is negative). just like your checking account. anyhow two things. it seems like in the Scott script it is lagging behind or one entry behind if that makes sense and also they running total is showing negative for each entry when it shouldn't. not sure if your posting date WHERE clause is causing the first or not. more testing but close.

Ah, it looks like you don't want the current entry in the running total, so change it to ">" rather than ">=".

thanks for all your help. I guess the last thing is why the running total is not summing correctly. I verified the insert into the temp table and the entries look correct where there is several negative entries and several positive entries as far as amounts go, etc. but looking at the running total they all show negative. I can't figure it out. the amount of (positive - which means the customer has a balance) $124,782.63 is the verified correct 'high balance'. the way this is summing in the sub query something is going on. hmmmm.

so I am trying to work through this. I verified that the records in the temp table are the same as I queried from the regular table. the problem is how they are being added to achieve the running total MAX. I have messed with the join, I have added a primary key to the temp table and also changed around the index. nothing. I am still having issues. just very strange to me that even when I use the same script that I know that works but takes way too long (reason why I initially posted) it seems to run soooo much faster because of the temp table then insert......but it is adding incorrectly. is this inherent in the way this is done?

No, there isn't, that's what's so frustrating here. It should work. If you don't need the date, you can drop that out of the table. I did a unique key in the table, I just don't make it a "primary key" unless I absolutely have to, because then you can't have a NULL value in any of the keys.

DROP TABLE #gl_entries
CREATE TABLE #gl_entries (
[Customer No_] varchar(10) NULL,
[Entry No_] int NULL,
Amount money,
UNIQUE CLUSTERED ( [Customer No_], [Entry No_] )
)

The speed is because there's an index that perfectly matches the total. If you could put some actual rows and expected totals in that would help a lot. Edit: Wait, I see a few there, when I can I'll test with those, although another customer's data wouldn't hurt :-).

@osupratt1,

I'll admit that I've not done a deep dive on all of the responses, so far, but I have noticed that they all use some form of "Triangular Join", which is really a half of a full CROSS JOIN (Cartesian Product/Square Join), or at least it seems that way from my quick scan. Those can get pretty slow pretty quickly.

I have a method that will do a running total (or grouped running total or both) on a million rows in seconds but I need more information.

  1. What is the actual order of the rows that you want the running total to be calculated by? For example, do you want the running total based on Posting Date using Entry No as the tie breaker for entries that occur on the same date?
  2. It would appear from your original post that you're looking for the "high water mark" for each Customer No for the given year. It that correct? In other words, you want the running total to work like a checkbook but for each Customer, Correct? And the first entry for each customer would be the first balance for that customer, correct?
  3. As a point of curiosity, how many total rows of data are we talking about? And how many unique Customer No's are we talking about?

For best performance refer to this url

http://sqlperformance.com/2012/07/t-sql-queries/running-totals