Quickest way to Pageinate Data?

We're using rownumber() to pageinate our data but it takes 6 mins just to do 100 rows. This is way too slow as we have 650000 rows.

Just for your info, we're passing this data from SQL into Elasticsearch every night. Because the data is so large we cannot hold it in memory. This is why we need to split it up into chunks of say 100000 rows

Can you please post the code (SQL) you are currently using to do the pagination

WITH PAF_CTE AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY ReferenceId) AS RowNum
FROM .vw_PAF
)
SELECT *
FROM PAF_CTE
WHERE RowNum BETWEEN (@PageNumber - 1) * @PageSize + 1
AND @PageNumber * @PageSize;

any reason your not using OFFSET/FETCH?

Also, is the ReferenceId column indexed?

so is the pagination using data from Elastic Search or SQL? if from SQL why are you dumping it into elastic search? can you please clarify

Offset/fetch is used in 2012 but our servers are on 2008. We are testing to see how long it takes to do 10000 rows now

Elasticsearch is much faster at returning searches and better at searching in general

??????

The stored procedure calls a view with a load of these commands

SELECT
'AUTOC_' + CAST(AutocompleteId AS VARCHAR(10)) AS ReferenceID,,....

therefore it's not indexed

You could index the view. The query in the view is not SARGable

Alternatively, you use a persisted, computed column on the table and index that

ok, thanks...i'll try that out and let you know the outcome

How is(are) the underlying data table(s) clustered? You should use the clustering order to avoid sorting, then you can insert just the "break" key values -- key for 100,000; key for 200,000; etc. -- into a table to control the batch loading.

The reason why it's taking 6 minutes is because you're using a view and the entire view is materialized by the CTE. Any chance of you posting the code for the view? I ask because way back when SQL 2000 was considered to be the cutting edge, I did a neat little trick that allowed even the logical last page of a million row table to be found and presented in about 425 ms. It would likely be 2-4 times faster on today's machines. I need to see the code for the view to figure out if it could actually work for you or not.

why are you passing all of your data every night. we have same situation and we do not do the whole batch every night, we push only rows that changed or new rows (deltas) on weekdays. then maybe once a week on slowest day like Sunday 12:01 am we push the whole thing.
You would have n+1 clustered Elastic Search databases,

  1. bring ES1 offline, load data, bring it back online
  2. bring ES2 offline, load data, bring it back online

This you could do only for Sunday full load. During the week only do deltas. smaller footprint

We do that too, but only where the Source data has a Create/Update date column. Is there a way to do this when there is no suitable Change Date available?

Thanks for all your responses.

What we noticed is that it was taking 6 mins just to number the rows, so each call to the SP would take 6 mins numbering + everything else ( selects from other views)

We cut this down by creating a global temp table using the referenceID and the rownumber() in the SP. The first time the SP is hit it creates this global table. Sequential 'paginated' hits would use the global temp table. We don't need to order the data so we didn't use ordering in the code which cut the time from 6 mins to 3 mins. We had to ensure this global table was deleted at the end of the process of uploading all data or in rollback.

We also used clustered index.

The reason why we're uploading everyday is because of the complexity of the database and the app that checks for updates. We use checksums

  1. Add DateTime column :joy:
  2. Checksum
  3. Retire
1 Like

The parent table has around 10 child tables. They are not referenced by keys. I think this is why they have checksums in use. Perhaps a trigger from the child table to the parent table?

Are you saying there is no FK --> PK reference. why not? and can you not add these?

OK but ... of course the 3rd party DB is junk, so usually "no chance" :slight_smile:

I worry about false positives :frowning:

"I'm organised he sighed ... and died" !!