Bulk data loading

Hi,
I have a table which has close to 46 million rows (with data and index close to 150GB).
This table was created without a clustered index.

Now we need to add a clustered index on this table.

Last time we tried using the normal "Create constraint" command. I had a total downtime of database around 4 hrs and it did not complete by then.

So now we are planning to create a new empty table with the PK, and have the 46 million rows inserted into this new table.

Is there any way we can speed up the insert process so that it completes inside 3-4 hours?

We tried it on dev..the initial few million loads fast...but then the process gets slower.

Any suggestions would be helpful.
Thanks

I dont think pre-creating the index is helpful. You could try this:

Set database recovery model to "simple"
Backup the database
Create new table without clustered index
BCP out all data from old table based on a query, rather than straight table extract, using a query that orders by the column you intend to make the primary key
BCP in the data into the new table
Create clustered index
Drop old table Rename new table
Set database recovery model back to Full, if that is what you use
Backup database

Thanks!!

BCP out all data from old table based on a query, rather than straight table extract, using a query that orders by the column you intend to make the primary key
BCP in the data into the new table
Create clustered index (We had an issue last time creating an index on this table with 46 million rows..wouldnt we see the same isseu if we create the index after loading the data)?

Also, is there any facility in SQL server which can move data from 1 table to another as fast as bcp or bulk_insert?(or can any of these 2 be used for that)

And in BCP, can I change the batch of 1000 to something bigger ..like 10k etc?

1000 rows successfully bulk-copied to host-file. Total received: 366000
1000 rows successfully bulk-copied to host-file. Total received: 367000
1000 rows successfully bulk-copied to host-file. Total received: 368000
1000 rows successfully bulk-copied to host-file. Total received: 369000
1000 rows successfully bulk-copied to host-file. Total received: 370000
1000 rows successfully bulk-copied to host-file. Total received: 371000

You can use Import/Export wizard. Right click on the database name in SSMS object explorer select Tasks -> Import (or export). It uses SSIS internally, and is quite fast - comparable to BCP. (That is based on my casual observations; I have not measured the performance to see if it is faster/slower).

I tried that that its very slow.. :sweat: it took close to 5-8 mins to load 1 million rows.

BCP out was taking less than a minute for 1 million rows..

But I am facing issues with BCP IN.

I am getting below error:

Starting copy...
SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC SQL Server Driver]Unexpected EOF encountered in BCP data-file

0 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.): total 16

command used:
bcp table1 format nul -n -f Z:\GoldenMaster\Format_n.xml -S myserver -T

bcp table1 out Z:\GoldenMaster\table1.dat -f Z:\GoldenMaster\Format_n.xml -S myserver -T -L 100

bcp table2 in Z:\GoldenMaster\table1.dat -f Z:\GoldenMaster\Format_n.xml -c -t, -S myserver -T

Try this:
bcp table1 out Z:\GoldenMaster\table1.dat -S myserver -T -c -t, -r\r\n

bcp table2 in Z:\GoldenMaster\table1.dat -S myserver -T -c -t, -r\r\n

Hi,
Tried the above..got the error:

Error = [Microsoft][ODBC SQL Server Driver]Invalid character value for cast specification

I have couple of XML columns too in the table. Would that cause any issue?

Thanks!

I changed the column delimiter to | and row delimiter to \n, and it worked fine :slight_smile:

I shall try with the 27 million rows I have on my dev server and get back in case of any errors...Thanks a lot!! :smile:

bcp table1 out Z:\GoldenMaster\table1.dat -S myserver -T -c -t "|" -r \n -L 100

bcp table2 in Z:\GoldenMaster\table1.dat -S myserver -T -c -t "|" -r \n

I'm working on some other suggestions but I first want to state that there is absolutely positively no reason to set the database to the SIMPLE recovery model. It breaks the log chain and is totally unnecessary for bulk logging. You only need to set the recovery model to the BULK LOGGED recovery model with the understanding that if a failure occurs during the time that you're in the BULK LOGGED mode, you will not be able to recover to the point of failure (same for SIMPLE). You can get close if you make sure that you do a log file backup before switching to the BULK LOGGED recovery model.

The BCP method is fine but then you have to do a final true up to catch any rows that may have been Inserted, Updated, or Deleted. Not fun and could take hours on top of it all unless you write a temporary "it changed" trigger to capture row identifiers during to a table during the operations involved.

And, no... you can't change the batch size on BCP out even with the batch parameter. It will be ignored. I'm not sure why they did that but that's the way it is.

There are also some other caveats that folks might not be considering. If you create a table without the clustered index, BCP into it, and then create the clustered index, the table will cause the database to grow by the size of the table (if there's not already enough space) because the HEAP will be preserved until the Clustered Index is built and then the HEAP will be dropped leaving you with at least 150GB of unused space in the database and that will be on top of the fact that you're going to drop the original table leaving at least another 150GB of free space. While that's not necessarily a bad thing, it will seriously add to restore times unless you have instant file initialization turned on. Free space in the database won't, however, affect the duration or size of your backups because the actual free space isn't stored to the backup. It's just reallocated during the restore.

And, make sure that you DO use the BULK LOGGED recovery model so that you don't blow out your log file and the related tape backups.

I'll be back. In the meantime, I need to see the full CREATE TABLE statement along with the CREATE INDEX statements of every NCI and I need to know if any FKs are pointing at this table. I also need to know if you're running the STANDARD EDITION or the ENTERPRISE EDITION as well as the names and order of the columns that you want to use for the new CI.

Last but not least, I need to know the overall size of the database, how much free space in the database you currently have, how much free space you have on the disk, the size of the 5 largest tables in the database, and how many tables you have. Depending on the limits of those things, this could be super simple or it could be painful.