I have a very large input file that will become many tables (80 of them) with different formats but all with the same key. I use VS 2015 and C# with T-SQL for the backend.
For each item a 'start' record with the item ID and then many records for different tables denoted by the first column (like +A, +B, +FC, etc.) Then a stop record for that one item ID.
Now one uses a StreamReader to read and a rather long IF statement with 80 'else' statements to split the file into the 80 tables and process depending on what data is in each of the record types. A INSERT is done in these functions function/methods to load that table with INSERT.
How does one make a rollback that will start with the 'start' method and end with the 'stop' method and allow the rollback of all the data from that particular item ID if it needs to (one of the records being bad.)?
Issue an explicit
BEGIN TRANSACTION
statement prior to the first action for the item ID. Then, after the last action for that item ID, issue a:
COMMIT TRANSACTION
statement.
So:
BEGIN TRANSACTION
--code to insert to table1
--code to insert to table2
--code to insert to table3
--...
--code to insert to table80
COMMIT TRANSACTION
For safety, you can enclose that in a
BEGIN TRY / END TRY
block so that you can do a catch that will ROLLBACK the transaction if needed:
BEGIN CATCH
...
IF XACT_STATE() <> 0
ROLLBACK TRANSACTION
...
END CATCH
It may not be relevant, but you can also have named SAVE point along the way - so you could rollback just the current Item ID, and not all the previous ones, if your circumstances permit it.
BEGIN TRANSACTION TransName01
SAVE TRANSACTION TransName02
...
IF @SomeError = 1
ROLLBACK TransName02
...
COMMIT TransName01
We do this whenever we use a transaction in a Procedure because if we issue a "ROLLBACK" (just like that, without a save point's name) it will rollback everything, but the process that called us may not want, nor expect!, that to happen, so in order to rollback JUST the things we changed locally within our process we do:
BEGIN TRANSACTION TransName01
SAVE TRANSACTION TransName02
... do stuff ..
IF @MyErrorCode <> 0
BEGIN
ROLLBACK TransName02
... Log / Display an error message ...
END
COMMIT TransName01 -- Always commit the outer wrapper
I've forgotten the details, but there is something funny about doing a ROLLBACK TransName01 (i.e. not having a named SAVE point), I think it upsets the transaction level which is passed back to the caller, so we always finish with a COMMIT (of the outer wrapper) but if we encoutner an error then we ROLLBACK the inner SAVE point (which, for us, is exactly the same scope as outer wrapper was)
So can you do this.. my project has many function/methods with the SQL writes in them.
If at the start of each set of rows to write to the many tables it has use one myConnection.Open(); and one myConnection.Close();?
That is like this:
if (fields[0] == "START_US_records") { myConnection.Open(); } //open the connect ... do a BEGIN TRY & BETIN TRANSACTION
else
if (fields[0] == "END_US_records") { myConnection.Close(); } //do a END TRY and ROLLBACK if error and then close the connection
else
if (fields[0] == "A") { writetableA(fields); } //record goes to table A
else
if (fields[0] == "B") { writetableB(fields); } //record goes to table B
else
if (fields[0] == "C") { writetableC(fields); } //record goes to table C
I'm not familiar with your application language, but it looks OK in principle.
It will be very slow though ... if you are only inserting a couple of rows, at a time, it will be fine, but if you are bulk-loading lots of rows the row-by-row, one-by-one, approach will be 10x slower, maybe even 100x slower, than doing it "in bulk". At the very least you have 5 round trips to the server for each of your "MyConnection" steps, and replacing that with just one would help. Don't worry about it if the number of records being processed is small though (except to worry that it won't scale well, if the size of the jobs will increase over time)
Thanks. The reason for the transactions being in blocks of records is each set of records belongs to one.. oil well (yes I'm in the oil and gas biz.) Thus if any records relating to one well are bad, or incorrect, in any way, we want to back out of that set of records and issue an error message so that well info can be fixed for processing later.
We have maybe 100k wells, 30k leases, and 80 tables for records relating to just that type of well (we have other databases to for other info on other types of wells.)
Yes it will take longer but I can't think of any other way to effectively process so much data.. on a monthly basis.
I will say, my job is interesting since I'm an old COBOL guy now in the new stuff.. keeps me off the streets!
Batch up the data in, say, a delimited file and Bulk Import the file.
Prepare the data into a "Table Array" and send that to the server as a single "block"
Send multiple INSERT statements as a single "block", rather than as individual statements
Combine all the input data using XML and send the XML block to a Stored Procedure to then act on it (a stored procedure can easily "iterate" through the XML file, or some other bulk-format, group statements into individual batches so that they logically COMMIT as one, or all rollback, and then "apply" the batches to the database "in bulk".
SQL will operate far FAR more efficiently if you provide it with a SET of data to process, rather than working row-by-agonising-row. It matters not-a-lot when volumes are low, but if they increase a row-by-row approach will scale very badly.