Designing record keys for a document-oriented database - best practices

Question

Designing record keys for a document-oriented database - best practices

Our team has begun developing an application supported by Couchbase DB; for each of us, this is the first experience with a database without SQL.

We started defining our entities and adopted the practice of using the type prefixes proposed by the Couchbase manual:

Entity "A": key: a#123 Entity "B": key: b#123

But we realized that we were confused with the choice of a strategy for creating complex document keys. We use counters a lot, and they need their own documents. Our keys are complicated:

 Daily counter "x" for entity "A": key: cntrx#a#123-20140117

We looked at different approaches, but we are still green on this subject and would like to ask some advice.

Are hierarchical keys good? Can anyone share their recommendations on defining non-trivial keys?

+4

nosql database-design couchbase

Mada Jan 17 '14 at 8:47

source share

2 answers

I have a few things to offer regarding your question.

Generally

Nosql is exactly as it seems - and requires a lot of different thinking than was used to develop good SQL databases. For example, the nosql database is basically a large hash map. Therefore, although it may be useful to put thoughts in your keys (for example, make them small), remember that this is just a means to access your documents. If there is no particular advantage arising from the fact that they look in a certain way, they do not need to mean anything at all - usually the primary search is primarily required. For example, how often do your users find out that they need to ask for "b # 123" directly when navigating to your application? The only place I can think that this is beneficial is the username or some other data that the user would know.

Compound keys

While CB management may suggest that composite keys are a good idea (and they can be very good for simple database structures), in general, the key size should be as small as possible. Keys are limited to a maximum of 256 bytes. All keys must be stored in RAM - therefore, the more data will be in your keys, the less will be available for the rest of your data. Instead, I would suggest creating a type field in your document, and then use the view to pull out objects of a particular type (or index objects by type). This will ultimately give you more flexibility in the future.

Counters

Your explanation of the counters is rather vague, so I assume that you use them as a key with automatic increment. I would suggest that this approach needs to be changed here to get away from counters. I use unique identifiers for all keys in my database. When I use a composite key, this is because the key itself is significant (for example, in versioned documents, I use the composite key of the document identifier + the date the document was saved to make sure it is unique). Even if you have several millions (or even billions) of objects, you can use 12 bytes of GUID to practically guarantee the uniqueness of document identifiers. This prevents a very bad bottleneck in your application if you need to save new entries.

+1

theMayer Jan 18 '14 at 4:37

source share

m03geek · Accepted Answer · 2014-01-17T15:43:41+0000

In our project, we used hierarchical keys as follows: The first part of the key is something like the table name from the RDBMS: users - represents the "table"

Then each user has his own identifier in the example:

users:1 - "represents a single user"

We used ':' because I think it looks better than other delimiters. You can use any delimiter you like.

If you want to use sequential indexes, such as id in the previous example, you need to get them from some key, therefore:

users:counter - a key that contains the "last user id" (it acts as an auto-increment)

If you need to save some kind of “subsection” for the user account, you can save it:

users:<user id>:subsection .

More complex example

users:1:avatars:1:url - means that with this key we get the URL of user 1 avatar, but if the user wants to store many avatars, they will be under users:1:avatars:X:url , where X will be value of users:1:avatars:counter .

We used this strategy for all documents that store only one value, JSON or even binary data.

So, just for your example, I will choose:

a:123-20140117:counter - this will mean that we have a (speaking in RDBMS language) table with the name "a", in table "a" we have an entry with id (or something else) "123-20140117 "with the cntrx field.

UPD: About the key size. It doesn't really matter. Yes, the keys are limited in size, but there are many ways to reduce it. One of them is to use hashes, but I think this is bad because the keys will be long and consume more memory. In our project, we used "short" keys for memcached bucket. We have an enumeration (which can also be stored in couchbase), which is the friendly name of a person and reduces its meaning.

Example: we have a set of records: a list of users with more than 30 photos. So, we have a key-value pair:

usersByPhotosCount - k:ubpc:{0}

and for 30 photos the key will be k:ubpc:30 .

But it is better to do such optimizations only in production. In development, it is better to have clear keys in the application and database (i.e. you can create two sets of kv pairs: normal for development, reduction and obfuscation for production and loading them depending on your environment).

Designing record keys for a document-oriented database - best practices

Generally

Compound keys

Counters

More articles: