CosmosDB - DocumentDB - Bulk insert without saturating collection RU

Question

CosmosDB - DocumentDB - Bulk insert without saturating collection RU

I am studying using Azure CosmosDB for an application that will require high read throughput and scalability. 99% of the activity will be readable, but sometimes we will need to insert somewhere from several documents into a potential batch of several millions.

I created a collection for testing and providing 2500 RU / sec. However, I ran into problems inserting even a total of 120 small (500 bytes) documents (I get a "request frequency large").

How can I use a db document in any useful way, if at any time I want to insert some documents, it will use all my RU and prevent someone from reading it?

Yes, I can increase the prepared RU, but if I only need 2500 to read, I do not want to pay 10,000 for a random insert.

Reading should be as fast as possible, ideally in the "single-digit-millisecond" range that Microsoft advertises. The inserts should not be as fast as possible, but faster.

I tried using the stored procedure that I saw, but could not insert everything reliably, I tried to create my own volume insert method using multiple threads, as suggested in the answer here , but this leads to very slow results, and often errors on at least for some documents, and, on average, seems to exceed the RU level, which is lower than I envisaged.

I feel that something is missing for me, do I need to provide RU only for recording in droves? Is there any functionality built in to limit the use of RU for insertion? How can I embed hundreds of thousands of documents in a reasonable amount of time without making the assembly unusable?

+5

c # nosql azure azure-cosmosdb

Tom Aug 11 '17 at 10:19

source share

2 answers

Kranthikiran · Answer 1 · 2017-08-12T00:03:34+0000

The key to faster insertion is load balancing across multiple physical partitions. In your case, based on the total amount of data that is in the collection, you should have a minimum of full partitions / 10 GB. Your common RUs are evenly distributed between these sections.

Based on your data model, if you can split your data, you can increase speed by writing parallel parallel sections.

Since you mentioned that you sometimes have to write a batch of several million lines, I would advise increasing the RU capacity for this period and decreasing it to the levels necessary for your reading load.

Recording using stored procedures while storing in network calls that you make may not be of much benefit, because a stored procedure can only be performed on one partition. Thus, he could only use the RUs that were allocated for this section.

https://docs.microsoft.com/en-us/azure/cosmos-db/partition-data#designing-for-partitioning contains some useful recommendations regarding which section makes sense.

Alex AIT · Answer 2 · 2017-11-17T19:13:40+0000

If you cannot improve the value of your inserts, you can go the other way and slow down the process until your overall performance is affected. If you look at a comparative benchmark of comparative performance (which inserts documents), you can take this as an idea on how to limit RU / s to your insertions. It shows a lot of parameters that can be tuned to improve performance, but it can obviously also be used to adapt your RU / s consumption to a certain level.

KranthiKiran's answer pretty much sums up everything else I can think of.

CosmosDB - DocumentDB - Bulk insert without saturating collection RU

More articles: