My Transaction Log is Full but I have a Minimal Logging Insert Statement

tkepongo · November 15, 2017, 7:33pm

Hi,

I'm getting "The transaction log for database 'xxx' is full due to 'ACTIVE_TRANSACTION' when I'm running a process to insert 309 million records into a table. We're using minimal logging methods and the INSERT statement only has a couple CASE statements and one function.

I'm able to run the query when I break it down into service date years (2014, 2015, etc) within SSIS but I'm hoping to find a way to have it run for all years (2014 - present)

Here's what my insert looks like (this is 309 million records)

select bunch of fields,
some case statements,
one function (the function is just another query - no updates or inserts)
from MedClaims c
JOIN ClaimHeader h
ON C.ID = H.ID
where c.Date >= '01/01/2014'

Is 309 million records just too much, even with minimal logging techniques? When I break it out to run within chunks, it takes about 20-30 hours to complete.

Thanks!

ScottPletcher · November 15, 2017, 8:01pm

How did you verify that you're using "minimal logging"?

Keep in mind that most often you'll need a "WITH ( TABLOCK )" hint to allow minimal logging. "SELECT ... INTO dbo.new_table" also does minimal logging but you don't want to do that for the actual inserts, only to create the table structure, because of the internal system table locks it holds.

yosiasz · November 15, 2017, 8:35pm

are you inserting cross server?
are you inserting cross database same server?
What is your recovery model on either one of your databases?

tkepongo · November 15, 2017, 8:49pm

Theres no cross-server or database. The recovery model is set to simple, as far as i know.

The stored procedure is actually on server B, while the Inserts is all done within Server A. Theres no traffic crossing over to any other database or server though

tkepongo · November 15, 2017, 8:51pm

Also, the Insert Into statement does have WITH ( TABLOCK )”

yosiasz · November 15, 2017, 10:10pm

do you have proper index on MedClaims.Date ?

I would recommend doing this in batches
by year or by month until you find the sweet spot?

yosiasz · November 15, 2017, 10:11pm

this is the definition of cross server sir

tkepongo · November 15, 2017, 11:37pm

Ooops sorry. I meant the stored procedure is on Database B but the Insert statement is all performed on Database A. So there's no data going from one DB to another. We don't think that would eliminate minimal logging though, right?

yosiasz · November 16, 2017, 12:21am

there is data going from DatabaseA to DatabaseB but should not necessarily matter but depends.

Do you have proper index on MedClaims.Date?
Are you sure both database are in Simple Recovery mode
.3 How often is this data transfer happening and why?

tkepongo · November 16, 2017, 12:33am

Yes, we have proper indexing on the Date and key identifier fields
Yes, we've set both to Simple Recovery mode
This data transfer is only performed once a month for a Tableau dashboard refresh.

yosiasz · November 16, 2017, 12:37am

and why is this huge dataset being transferred to another database? performance reasons? separation of concerns?
have you looked at the option of creating a proper data.lake, data.store specific to this BI endeavor?
Dimension and Facts etc

tkepongo · November 16, 2017, 12:48am

Hi,

Data isn't being transferred from database to database, just tables. I recreated the stored procedure in Database_A just to try out if things will run better tonight.

We're a pretty small analytics shop but we are investing in tools to help get us to building out a proper data lake.

Before:
use database_B
GO

select bunch of fields,
some case statements,
one function (the function is just another query - no updates or inserts)
from database_A.MedClaims c
JOIN database_A.ClaimHeader h
ON C. ID = H. ID
where c.Date >= ‘01/01/2014’

Now Running with this tonight:
use database_A
GO

select bunch of fields,
some case statements,
one function (the function is just another query - no updates or inserts)
from database_A.MedClaims c
JOIN database_A.ClaimHeader h
ON C.ID = H.ID
where c.Date >= ‘01/01/2014’

yosiasz · November 16, 2017, 12:49am

oh man forgot to ask the simple question (which is usually the best answer)
Do you have separate files for both .ldf and .mdf files? and are they being saved to local disk? and what is the state of the drive on which these are saved?

tkepongo · November 16, 2017, 12:49am

Hmm...I'll ask our administrator about that when he comes in tomorrow

Kristen · November 16, 2017, 7:53am

Did you maybe mean separate Drives?

But scratching my head a bit as to how that would effect the size of the logging, so I've probably got the wrong end of the stick ...

Kristen · November 16, 2017, 7:56am

How big is the LDF Transaction File when it fails? Would be helpful to know what sort of size of problem this is.

I wonder if it would be worth using the Query to output to BCP file (in Native format), sorted by the Clustered Index of the target table, and then do a bulk-import on it?

Seems like a sledge-hammer to crack-a-nut though ...

yosiasz · November 16, 2017, 4:29pm

yes as in some fancy schmancy ssd blazing fast whachamacallit

ScottPletcher · November 16, 2017, 5:30pm

We need to see the actual query code, not just a very brief summary of what the code contains, as well as full table and index definitions. You wrote: "Here's what the INSERT looks like", but that code doesn't even include an INSERT!

The recovery model of the source db is irrelevant.

The destination db log file should be expanded to the size needed for the load before the load begins. This prevents dynamic expansion of the log, which can be a very slow operation.

It would also be very helpful to have the result of this command:

EXEC [destination_db].sys.sp_helpfile

And, to make sure the log file is not too fragmented, the number of rows of output from this command (NOT the result themselves, just the total number of rows listed:

USE destination_db;
DBCC LOGINFO;

Kristen · November 16, 2017, 7:26pm

Good point. Also worth checking what the expansion is set to - it might be 1MB ... or 10% which, on a large database, can be a huge single-extension increment.

We fiddle with all our large DBs to pre-configure the Log file to either a large amount (up to 8GB) or extend it in 8GB chunks if we need larger than that (but not an exact multiple of 8GB - although I don't know if that is still an issue)

As per this link:
http://www.sqlskills.com/BLOGS/KIMBERLY/post/Transaction-Log-VLFs-too-many-or-too-few.aspx

jeffw8713 · November 16, 2017, 7:45pm

I see that you are using SSIS...so a couple of notes on how to improve performance with that system:

On the OLEDB Source - make sure you are using a SQL command to extract the data. This can either be the query itself - or preferably a stored procedure. Using a table or view directly can be extremely slow...
On the OLEDB Destination - make sure you use the option 'Table or view - fast load', check the option for 'Table lock' and 'Check constraints' and set the rows per batch to a reasonable size (200000 works well on my systems) and finally - make sure you set the maximum insert commit size is also set to a reasonable size (again 200000 works well for my systems).

The maximum insert commit size will issue a commit at that size - so the transaction log can then be marked as reusable and not grow to fill the drive and run out of space.

It may be possible to use the SQL Server Destination instead of the OLEDB Destination - which may (repeat may) perform better, but there are restrictions. You definitely have to set parameters using the advanced editor - and insure you set a MaxInsertCommitSize (and other properties as needed) - and you have to insure that the package is running on the destination node.