Encode PartitionKey in an id document?

I set the partition key of one of my Cosmos databases to /partition .

For example: we have a Chat document containing a list of subscribers, then we have ChatMessages , which contains text, a link to the author, and some other properties. Both documents have a partition property, which contains the type "chat" and a chat identifier.

Chat example:

 { "id" : "955f3eca-d28d-4f83-976a-f5ff26d0cf2c", "name" : "SO questions", "isChat" : true, "partition" : "chat_955f3eca-d28d-4f83-976a-f5ff26d0cf2c", "subscribers" : [ ... ] } 

Then we have Message docs:

 { "id" : "4d1c7b8c-bf89-47e0-83e1-a8cf0d71ce5a", "authorId" : "some guid", "isMessage" : true, "partition" : "chat_955f3eca-d28d-4f83-976a-f5ff26d0cf2c", "text" : "What should I do?" } 

Now it’s very convenient to return all messages for a specific chat, I just need to request all the documents in the chat_955f3eca-d28d-4f83-976a-f5ff26d0cf2c using the isMessage = true property. Things are good...

But if now I want to request my db for a specific message by id, I usually just know the identifier, but not the section, and therefore should run a slow crosspartition request. Which then led me to the question if I shouldn't add partitionKey to the message id so that I could split the id when requesting db for a faster search. I saw that the _rid property of the document looks like a combination of the db id and collection id, and then the document id. I mean this (simplified):

 Chat.Id = "abc" Chat.Partition = "chat_abc" //[type]_[chatId] Message.Id = "chat_abc|123" //[Chat.Partition]|[Message.Id] Message.Partition = chat_abc //[Chat.Partition] 

Suppose now that I want to get a Message document using id , I just split id by character | , and then requested a document with the 1st part of id as a section and a full identifier as a key.

It makes sense? Are there any better ways to do this? Should I just always pass the partitionKey document together, not just id ? Should I just use the _rid properties?

Any experience is greatly appreciated!

UPDATE

I found the following answer here :

Some applications encode a section key as part of an identifier, for example. the section key will be the customer’s identifier, and ID = "customer_id.order_id", so you can extract the section key from the identifier value.

I also asked the space team via email if this is the recommended sample and send a response if I receive it.

0
source share
2 answers

Yes, your suggestion on retrieving the partition key from id (through an agreement similar to the prefix / separator) makes sense. This is common among applications that have one key and want to reorganize it to use the Cosmos database from another storage system.

If you are creating an application from scratch, you should consider connecting a composite key (section key + element key ("id")) through your API / application.

+1
source

Firstly, if you know your data size (and index), then you will remain within 10gb, and the RU / sec limit is in order, then a fixed assembly without partitions will do without this problem. Probably, the OP consciously decided on the need for separation, but this is important for the purpose of generalization. If possible, KISS;)

If separation is mandatory, then AFAIK you cannot avoid crosspartition split and its overhead if you do not know the partition key.

Imho OP's suggestion of merging a duplicate partition key in an id field is a rather ugly solution because:

  • The name id means that this is a unique key , the section key is not part of it or is necessary for this key and its uniqueness. Anyone who uses this key upstream will incur the forced excess cost of a longer key blocked from using a simpler Guid type, etc.
  • It will be a mess if your split key changes in the future.
  • The internal structure of id merging will not be intuitive without documentation - its parts will not be named, and even if they would look like new developers would not know for sure without finding external documentation to reliably understand what is happening.
  • Your data model does not require duplication at the semantic level, your application will require convenience and, therefore, such hacks should belong to your application code, not the data model. If possible, such leaks should be avoided.
  • Duplication of data within the document will unnecessarily increase the size of the document, the bandwidth , etc. (may or may not be noticeable, depending on scale and use). duplication in the document is sometimes required, but imho is not necessary in this case.

A better design will ensure that the key is always present in a logical context and can be passed to search engines . If you don’t have one, then perhaps you need to reorganize the application code (rather than the data design) to explicitly skip chatId along with id where necessary. It is WITHOUT combining them into some opaque string format.

Also, I don’t see a good way to use _rid for this, as if I remember correctly that it did not contain any internal link to the section or section.

Disclaimer: I do not have access or a deep understanding of CosmosDB's internal design or _rid logic for partitioned collections. Perhaps I misunderstood how this works.

0
source

Source: https://habr.com/ru/post/1262792/


All Articles