Usage of Rows_Per_Batch in SQL OPENROWSET

Hi,

I am using SQL Openrowset to read the file and load the data into a table. The file contains about 150 million records worth data. When I try to load the data the transaction log bloats because of one single transaction and it fails due to "The transaction log for database 'DB' is full due to 'ACTIVE_TRANSACTION'. The log size is as big as 110GB. I have set my database recovery model to "Bulk-Logged". The table to which I am loading the data does not have any indexes.

I am trying to find out how ROWS_PER_BATCH in OPENROWSET will be useful if I use it my query. Will SQL server really consider the value provided in ROWS_PER_BATCH and split the transaction according to that, or will it still consider as one single transaction.

Can anyone help me out in this regard.

Below is my sample query which I am using:
begin try
delete from emp
insert into emp (EmpID, EmpFirstName,EmpLastName,EmpLocation)
select EmpID, EmpFirstName,EmpLastName,EmpLocation from openrowset(
BULK 'D:\Test\Emp.txt',
FORMATFILE = 'd:\Test\Employee.XML',
FIRSTROW = 1,
MAXERRORS = 10,
rows_per_batch = 5
) as a

end try
begin catch
select ERROR_MESSAGE()
end catch
end
GO

Thanks,
Aravind S

is this a one time bulk ingest or continuous?

Yes this is a one time bulk ingest and this huge file we will receive only once.

ROWS_PER_BATCH does not cause the system to commit data at that point - it is only used to help SQL Server allocate memory for the process.

If you use SSIS with the OLEDB Destination and the table/fast load option - you can set the batch size and commit size. This will tell SQL Server to commit the data once you have reached that number of rows - and that will allow the transaction log to be reused immediately (if in simple recovery model) or immediately after a transaction log backup if in full/bulk recovery models.

You could do the same thing using BULK INSERT - but that requires creating a file to be exported from the source system.

What jeffw8713 said, use SSIS and in there it does provide I believe some batch or row sizes. also have you thought of using a staging database with Simple recovery model instead?

Thank you yosiasz and jeffw8713 for your suggestions on SSIS. Unfortunately I can't change the code to use SSIS atleast at this point of time as the code has moved up to higher environments. I understand your thoughts about SSIS and I will give it a try. Also, is there any way that I can use the same technique and load data in batches?

Yes, the database was also set to Simple recovery model once when we loaded this file but with no luck it failed. I believe the OPENROWSET uses only single transaction - is that true?

it is one file so unleas you figure out some other scripting language such as python to read n amount of line or some other delimiter ie 10 customers at a time based on certain rules I cant imagine tsql and openrowset at that to be your solution. ssis packages have a nice batching mechanism but huge file is huge file and you might need to consider your architecture. for example can we break up the files 100 customers each , loop and ingest etc. full vet in pre prod or lower environments and then if all is green push to upper environments

You can use BULK INSERT and set the batchsize parameter. See: https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017

I do not understand why you cannot utilize SSIS (SQL Server Data Tools) for a one-time data load...you stated this is a one-time operation but later stated it cannot be changed because the code has moved to higher environments?

If the goal is to load the data - and the current process isn't working - then you need to change the code and redeploy the code to all environments anyways.

Hi Jeff - Yes it is a one-time operation but this is a generic one. Hence I am afraid I can switch to a different solution just for this one. I completely agree your point going with SSIS which I am going to give a try. In the mean time we have a slightly modified version of this solution. Load the data into the temp table and then from temp move the data to main table in batches of 50k. If this fails, then the other option would be to load the data directly into main table using SQL BULK INSERT.