Clustered Columnstore and Update Performance

James · March 28, 2017, 2:30pm

Demographics:

SQL 2016 SP1
Flavor: Enterprise
Hardware: 8 cores
Memory: 128 GB
Disk: All SSD's
Network: very fast

I know that I should post DDL and sample data but I am curious about other peoples experiences with clustered columnstore table structures. I don't think this is a DDL or indexing issue.

We have some fairly wide and tall tables that are used for populating cubes. 700+ Million rows. Almost all VARCHAR(255) columns. They have been traditional rowstore objects but we converted some to clustered columnstore. The read performance was stellar. It actually reflected some of the marketing info Microsoft puts out regarding performance gains.

Unfortunately, we had to make changes to some of the data and this happens weekly and monthly in some cases. Updating values in batches of 20 million rows ran significantly slower than traditional rowstore tables. The table is also partitioned and the updates were being run against a single partition (out of 30 or so) at any time.

In the end, we removed the clustered columnstore index, created specific indexes for the updated statements, ran the updates, removed the task specific indexes, then recreated the clustered columnstore.

Are the performance issues with updates a common issue with columnstore tables?

I have read anything I can get my hands on regarding this and have not found a satisfying explanation. One of the best resources I found was this Site

We are about to load 20 Billion rows using columnstore and the data will have updates daily. If modifications are so painful, this may not be the solution we hope it will be.

Any thoughts?

I appreciate your time and consideration.

Kristen · March 28, 2017, 4:42pm

I wonder if @TaraKizer, or someone else at Brent Ozar, would have the knowledge you need?

James · March 28, 2017, 5:12pm

I will check to be sure, thanks for the response! After LOTS more research it seems that updates are evil with clustered columnstore indexes. It reads all data into memory which results in spills to tempdb which means slow. It also issues update statements for rows that may not have changed resulting in excessive log writes - unlike traditional rowstore structures. The solution is to delete then insert - if I understand correctly. I can handle that if I understand that is what is required for this.

robert_volk · March 30, 2017, 11:13am

Take a look at Niko's series, he covers a lot of the performance quirks and how to work around them. Partition loading and switching is probably your best bet. Also reach out to him via his blog, or if you are on Twitter, try the #sqlhelp hashtag. There are a number of folks on there who can advise.

If it's at all possible to evaluate your varchar(255) columns and move them to other, smaller data types, you might see some improvement there.

James · March 30, 2017, 1:23pm

Thanks for the feedback!

I found his blog and have been consuming everything I can about the subject. It would be great if someone would write a book on just the columnstore aspects of DW - hint hint.

As for the suggestion on the VARCHAR(255) columns, I agree that more conservative data types are the way to go. Unfortunately, the table design is such that it pulls together data from MANY different systems that we have no control over. I am thankful the original table designs didn't throw everything into NVARCHAR(MAX) columns, or worse, TEXT - shudder.

As an aside, the literature on columnstore performance was close to accurate when it comes to performance boosts for reads. Processes that took 18 hours to run for one iteration out of 12 took 30 minutes for all 12. Rather impressive. The reading performance gains are worth finding ways to handle updates better on our side. Populating a clustered columnstore table (very wide with 150 ugly columns) on 700 million rows takes about two hours. We are fortunate to have a very fast network and I/O subsystem of all SSD's.

Thank you all for your feedback and time.