SQLTeam.com | Weblogs | Forums

SQL mdf and log files


#1

Hi,

I have 2 databases and am inserting data (millions of records) from one database to another to create ONE large database. All the fields, schema of the databases are identical.

I have limited space on my server, and note this database will not be appended after the creation. It will be a static database and users will be retrieving and reading the data ONLY.

Now, on the insertion the log files grew for the last table as it had (176 million records) - So I went to Tasks - Shrink - Files and selected log file (then optioned for Release Unused Space) - This brought the log file down (it had built to 80 GB).

Question1. Is the above practice OK since the database will not be appended in future and users will ONLY access the data as it is? Will this mess up my database?

Question 2. Before I append the data from one database to another - I copied all the data from the table (e.g., dbo.company to dbo.companycpy) and then did the append procedure to dbo.company - the reason I make a copy is as a fall safe in case I corrupt the table. Then when I complete the append I go back to the database and delete the table (dbo.companycpy) - this practice in return has resulted to the growth of the mdf file. How can I restore the used space without interfering with the data or database structure? I read people writing of index fragmentation which is really making me concerned.
Any suggestions how I can take this forward will be highly appreciated.

Thank you


#2
  1. Put the database in SIMPLE recovery mode.
  2. batch the inserts (say, 100,000 rows per commit) to keep log requirements low. Alternatively, do a BULK INSERT if possible with an ORDER clause so SQL can use minimal logging
  3. SQL 2012 or up? Consider building a COLUMNSTORE index. You'll get space savings and performance improvements.
  4. Why are you worried about the size of the mdf file?
  5. Consider partitioning your destination table and using partition switching to move data in/out. That's a metadata operation with no actual data movement.
  6. Index fragmentation is not directly related to mdf growth per se. however, a good practice is to disable/drop indexes before a large load then re-create/rebuild the indexes when finished.

#3

Q1) If the db is in simple recovery model: Yes. No.
Q2) You should create a separate db to hold the table copies/backups, with no indexes on those tables (except a clustered index if and only if the data going into that table is also already clustered by that column(s)). Then, once the new data load is confirmed good, you can simply truncate or drop the backup table.
Writing instead into the same db is always going to cause .mdf growth you can't get rid of, while also making the indexes inevitably much more fragmented.