Inserting a 300 million records into table using batches

Angel · October 30, 2021, 9:13am

Morning Guys,

I need help with this query below.

This table contains 300 million rows.
The table have no primary key.
PatientID can have multiple TreatmentCodes but different dates.
ConsultantID is unique but, sometimes blank.
I want to insert the record into a table using batches of 25,000 per row.
How do I do this without repeating already inserted row.
I have used SSIS package and it works but, how do I handle this in SQL. Thanks in advance.

DECLARE @TestData TABLE
(PatientID varchar(255),
EventDate varchar(10),
TreatmentCode varchar(255),
ConsultantID varchar(255)
)

INSERT INTO @TestData (PatientID, EventDate, TreatmentCode, ConsultantID)
VALUES ('10XX1099000', '20200401', 'Xc001', '100001');

INSERT INTO @TestData (PatientID, EventDate, TreatmentCode, ConsultantID)
VALUES ('10XX1099000', '20200402', 'Xc001', '100004');
INSERT INTO @TestData (PatientID, EventDate, TreatmentCode, ConsultantID)
VALUES ('10XX1099000', '20200403', 'Xc002', 'null');

INSERT INTO @TestData (PatientID, EventDate, TreatmentCode, ConsultantID)
VALUES ('10XX1099001', '20200401', 'Xc006', '100001');

INSERT INTO @TestData (PatientID, EventDate, TreatmentCode, ConsultantID)
VALUES ('10XX1099001', '20200402', 'Xc006', 'null');
INSERT INTO @TestData (PatientID, EventDate, TreatmentCode, ConsultantID)
VALUES ('10XX1099001', '20200403', 'Xc002', 'null');

Select * from @TestData

JeffModen · October 30, 2021, 7:48pm

I'm confused by your post...

You say you have a 300 million row table. Is the table you speak of in Item #5 that 300 million row table? If so, where are you getting the 25,000 rows from to put into it? You need to explain a bit more at to what the source and destinations are.

As a bit of a sidebar, having a 300 million row heap is a bit of a problem unless it is truly only a history/log table.

Angel · October 31, 2021, 9:07am

Sorry Jeff,

I want to split the records inti batches of 25,000 or more if that makes sense.
Thanks.

JeffModen · October 31, 2021, 6:32pm

Yep. I know what you want but, especially in this case, I believe it's going to cost you a whole lot more than a single insert of 300 Million rows.

Shifting gears to what you want, you've still not provided enough information for anyone to accurately help,

RogierPronk · October 31, 2021, 7:05pm

In your case I would add an extra column, for example batchnumber. Then you can insert the records by TOP 25,000 and fill in the batchnumber. In your WHERE clausule you can add WHERE batchnumber IS NULL so you won't keep updating the same rows. After the inserts you can delete the extra column.

mike01 · November 1, 2021, 2:36pm

are you generating 300m insert statements? How are you getting the raw data? There are faster/better ways of doing this, but unclear on exact requirements. have you looking into bulk insert or bcp?