How can I optimize this procedure?

So I have this stored procedure (in Microsoft SQL Server) which is kinda slow.

CREATE PROCEDURE [APInvoice].[GetDuplicateInvoicesList] 
(
  @VendorID NVARCHAR(15),
  @InvoiceNumber NVARCHAR(50),
  @IncidentNumber NVARCHAR(50)
)
AS
BEGIN
  DECLARE @_InvoiceID INT
  DECLARE @_InvoiceNumber NVARCHAR(50)
  DECLARE @_InvoiceAmount DECIMAL(13,2)
  DECLARE @_InvoiceDate DATE
  DECLARE @_InitiationDateTime DATE
  DECLARE @_IncidentStatus NVARCHAR(50)
  DECLARE @_VendorID NVARCHAR(15)
  DECLARE @_InvoiceNumberCount INT

  DECLARE @LOCAL_TABLEVARIABLE TABLE
  (
      InvoiceNumber NVARCHAR(50),
    InitiationDateTime DATE,
      VendorID NVARCHAR(15),
      rn int --row number
  )
  DECLARE @LOCAL_TABLEVARIABLE_2 TABLE
  (
      InvoiceID INT,
      InvoiceNumber NVARCHAR(50),
      InvoiceAmount DECIMAL(13,2),
      InvoiceDate DATE,
      InitiationDateTime DATE,
      IncidentStatus NVARCHAR(50),
      VendorID NVARCHAR(15),
      TheOldestInvoiceIndex INT,
      ProcessName NVARCHAR(100)
  )
  DECLARE @InvoiceNumberCheckedArray TABLE
  (
      InvoiceNumber NVARCHAR(50)
  )
  DECLARE @StepTable TABLE
  (
      IncidentID INT,
      IncidentNumber2 INT,
      StepName NVARCHAR(50),
      ProcessName2 NVARCHAR(100)
  )

  -- Get a list of all current statuses (with stepnames) for all incidents
  INSERT INTO @StepTable
  SELECT iin.id as IncidentID, st.IncidentNumber as IncidentNumber2, st.StepName as StepName, st.ProcessName as ProcessName2
  FROM APInvoice.vwAPInvoiceActiveUltimusTasks st
  INNER JOIN APInvoice.InvoiceIncidentData iin ON iin.ProcessName = st.ProcessName AND iin.IncidentNumber = st.IncidentNumber;

  -- Get a list of all InvoiceNumbers that are open for given VendorID
  INSERT INTO @LOCAL_TABLEVARIABLE 
      SELECT * FROM (
          SELECT
              InvoiceNumber
              ,InitiationDateTime
              ,VendorID
              ,ROW_NUMBER() OVER(PARTITION BY InvoiceNumber ORDER BY InitiationDateTime DESC) AS rn 
          FROM APInvoice.InvoiceIncidentData
          WHERE VendorID = @VendorID
              AND IncidentNumber <> @IncidentNumber
              AND (IncidentStatus = 'Approved' OR (IncidentStatus = 'Pending'))
      ) result 
    WHERE rn = 1

  DECLARE LocalCursor CURSOR FOR    
  -- Find all duplicates +  max 10 non-duplicate invoices
  SELECT InvoiceNumber FROM @LOCAL_TABLEVARIABLE ORDER BY InitiationDateTime DESC

  OPEN LocalCursor

  FETCH NEXT FROM LocalCursor INTO @_InvoiceNumber 

  WHILE @@FETCH_STATUS = 0
  BEGIN
      BEGIN
          SET @_InvoiceNumberCount = (SELECT COUNT(*) FROM @LOCAL_TABLEVARIABLE_2 WHERE InvoiceNumber = @_InvoiceNumber)
          IF @_InvoiceNumberCount = 0
          BEGIN
              INSERT INTO @LOCAL_TABLEVARIABLE_2 
              SELECT
              id AS InvoiceID,
              InvoiceNumber,
              InvoiceAmount,
              InvoiceDate,
              InitiationDateTime,
              IncidentStatus,
              VendorID,
              ROW_NUMBER() OVER(ORDER BY InvoiceNumber ASC) AS TheOldestInvoiceIndex, -- the first is the oldest and original, others are duplicates
              REPLACE(ProcessName, 'Invoice_Approval_', '') AS ProcessName
              FROM APInvoice.InvoiceIncidentData
              WHERE VendorID = @VendorID
                  AND InvoiceNumber = @InvoiceNumber
                  AND IncidentNumber <> @IncidentNumber
                  AND (IncidentStatus = 'Approved' OR (IncidentStatus = 'Pending'))
              ORDER BY InitiationDateTime ASC
              INSERT INTO @InvoiceNumberCheckedArray
              SELECT @_InvoiceNumber AS InvoiceNumber
          END
      END
      FETCH NEXT FROM LocalCursor INTO @_InvoiceNumber
  END
  CLOSE LocalCursor
  DEALLOCATE LocalCursor

  SELECT lt.InvoiceID, lt.InvoiceNumber, lt.InvoiceAmount, lt.InvoiceDate, lt.IncidentStatus, lt.TheOldestInvoiceIndex, lt.ProcessName, s.StepName 
  FROM @LOCAL_TABLEVARIABLE_2 lt 
  INNER JOIN @StepTable s ON s.IncidentID = lt.InvoiceID 
  ORDER BY InvoiceNumber DESC;
END

I would really appreciate it if someone showed me a way to optimize this procedure.

Hi @z3onn and welcome.
I think I'm right in saying that the most likely cause of SP slowness lies in the use of cursors.
Not having the database and not knowing the logic of the program, I'm not able to give you a more precise indication.
The only advice I can give you is to review the architecture of the project trying to establish whether it is possible to avoid resorting to cursors which require sequential processing.
Good job and sorry if I wasn't able to be more helpful.

1 Like

Hello @z3onn
I addition to what gdl said about cursors, you can see the execution plan for the stored procedure by hitting ctrl-M and running the stored procedure. That will tell you where the bottleneck is.

1 Like

You'll want to test this, but you can replace everything from DECLARE LocalCursor to DEALLOCATE LocalCursor, inclusive, with the following:

INSERT INTO @LOCAL_TABLEVARIABLE_2 
SELECT
a.id AS InvoiceID,
a.InvoiceNumber,
a.InvoiceAmount,
a.InvoiceDate,
a.InitiationDateTime,
a.IncidentStatus,
a.VendorID,
ROW_NUMBER() OVER(ORDER BY a.InvoiceNumber ASC) AS TheOldestInvoiceIndex, -- the first is the oldest and original, others are duplicates
REPLACE(a.ProcessName, 'Invoice_Approval_', '') AS ProcessName
FROM APInvoice.InvoiceIncidentData a
INNER JOIN @LOCAL_TABLEVARIABLE z ON a.InvoiceNumber=z.InvoiceNumber
WHERE a.VendorID = @VendorID
    AND a.IncidentNumber <> @IncidentNumber
    AND a.IncidentStatus IN('Approved','Pending')
--ORDER BY a.InitiationDateTime ASC  -- this doesn't make sense without a TOP directive

The INSERT INTO @InvoiceNumberCheckedArray is unnecessary, as you don't reference that table later in the procedure. Most of the variables you DECLARE at the top are also unused.

Additional suggestions:

  • Use temp tables (#table_name) instead of table variables, unless you'll never have more than 1000 rows in any of these results. Anything more than that will benefit from column statistics, which table variables do not have.

  • Put PRIMARY KEY or UNIQUE constraint on InvoiceNumber and/or InvoiceID columns in the temp tables, to prevent duplicates and help the optimizer with cardinality estimates.

  • I don't think you need the ORDER BY statements that are not in the ROW_NUMBER() expressions, except for the last SELECT. Using an ORDER BY in an INSERT statement is unhelpful, as table variables are unordered heaps anyway. The ORDER BY expressions are also inconsistent between the INSERTs and cursor definition. Don't waste CPU and disk I/O on a sort when it won't matter later.

  • The criteria for INSERTing into @LOCAL_TABLEVARIABLE and @LOCAL_TABLEVARIABLE2 are identical, and can very likely be combined into 1 INSERT into 1 table only.

  • Lastly, I started on this but had to give up, but I'm quite certain this can be written without any table variables or temp tables at all. In the worst case the initial INSERT...SELECT into @LOCAL_TABLEVARIABLE can be rewritten as a Common Table Expression (CTE) to identify the InvoiceNumbers, and can then be JOINed to InvoiceIncidentData and vwAPInvoiceActiveUltimusTasks in a single SELECT query.

1 Like

@robert_volk @apfanz @gdl Thank you all for your help :smile:

hi

hope this helps

instead of cursor

you can try SET Based Solution
writing a T-SQL Statement

with indexes

SELECT 
    InvoiceNumber 
	, count(*) 
FROM 
   APInvoice.vwAPInvoiceActiveUltimusTasks st
          INNER JOIN 
   APInvoice.InvoiceIncidentData iin 
                 ON iin.ProcessName = st.ProcessName AND iin.IncidentNumber = st.IncidentNumber
WHERE VendorID = @VendorID
              AND IncidentNumber <> @IncidentNumber
			  AND InvoiceNumber = @InvoiceNumber	
              AND IncidentStatus IN  ('Approved','Pending')
GROUP BY 
     InvoiceNumber
HAVING 
    count(*) > 1