Modeling multilingual data on MongoDB

I am trying to model my objects on MonogoDB and don’t know how to do it. I create a product catalog that will be:

  • Frequent changes in the product catalog. Mass operation can be done weekly / two weeks.
  • Product information is available in several languages ​​(English, Spanish, French), a new language can be added at any time.

Here's what I'm trying to do: I need to model my product catalog in order to capture multilingual functionality. Suppose I have:

product : { _id:xxx, sku:"23456", name:"Name", description: "Product details", tags:["x1","x2"]}... } 

Of course, the name, description, tags and possible images will change according to the language. So how do I simulate it?

  • I can have a separate collection for each language, for example: enProducts, esProducts, etc.
  • Representation of JSON in the product with separate languages, for example:

     product :{ id: xxx, en: { name: "Name", description: "product details.." }, es: { name: "Name", description: "product details.." }, ... } 


Or is there another solution? Need help from MongoDB modeling experts here :)

+13
source share
7 answers

Both solutions are usually standard for this, with the first one being standard for RDBMS technicians (or file-based translations are another method that is not possible here).

As best here, I lean toward the second, considering your use.

Some of the reasons are:

  • Download one document for all translations and product data, without JOINs
  • Create a single continuous read of your disk
  • Permission to atomically update and add new languages ​​and changes, etc. to one product

But creating some flaws:

  • An update can (possibly will) create fragmentation that can be corrected to some extent (not completely) with powerof2sizes
  • Now all your operating systems will go to one part of your hard drive, which can actually create a bottleneck, but your scenario is that you do not update often, if at all, this should not be a problem.

As a side note: I judge that fragmentation may not be too big a problem for you. The reason is that you are only really bulk import goods, possibly from CSV, so your documents will probably not grow more than force 2 from inserting them on a regular basis. Therefore, this point may be outdated.

Thus, in general, if you plan correctly, the second option is good, but there are some considerations that need to be considered:

  • Can multiple descriptions / fields delete a document beyond 16meg?
  • How to manually insert a document into a document to make efficient use of space and prevent fragmentation?

These are your biggest problems if you go with the second option.

Given that you can put all Shakespear jobs in 4MB with spare space, I'm really not sure if you will reach the limit of 16 MB, if you do, this should be significant text and possibly store the images in binary format in a document.

Returning to the first option, your biggest problem will be the duplication of certain data, that is, the price (France and Spain have euros) if you do not use two documents, one for sharing the general data, and the other a translation (this will be done 4 document actually, but two requests).

Considering that this directory will never be updated if voluminous duplicate data does not matter much (however, for future reference, in case of extension I will be careful), like this:

  • You can make it one document for translation, and not worry about updating prices atomically in all regions.
  • You have one drive without fragmentation
  • You do not need to manually post your documents.

Thus, both options are available, but I am inclined to the second case.

+7
source

Another option is to simply keep the values ​​different for each language. Perhaps simplifying the simplification of the scheme:

 product : { _id:xxx, sku: { und: "23456" }, name: { en: "Fork", de: "Gabel" }, description: { en: "A metal thingy with four spikes", de: "Eine Dinge aus metal der vier spitze hat" } } 

und will be short for "undefined", that is, for all languages, and can be used as a reserve - or you always use "en" as a backup if you want.

The above example shows how Drupal CMS manages languages ​​(albeit translated from SQL to Mongo).

+2
source

Another option is to store your primary data in only one language and have a separate assembly for translating text resources, where you map any text resource from your main language to other target languages ​​(regardless of what your text resource comes from the primary data storage or just transferring a system message to your system).

those. do not make any language settings in the circuit and model in general.

The drawback that I see is that it supports the removal of information from the translation collection when the product is removed from the main repository, well, as soon as you guarantee that the same resource is not used elsewhere, it is trivial, but needs to be programmed :)

+1
source

For a static list of languages, I would go with @Zagorulkin Dmitry's solution, as it is easy to request.

For a dynamic list of languages, I would prefer not to change the scheme and simplify data management.

The downside is that the request is less trivial.

  { "product": { "id": "xxx", "languageDependentData": [ { "language": "en", "name": "Name", "description": "product details.." }, { "language": "es", "name": "Name", "description": "product details.." } ] } } 
+1
source

this way will be the best:

 product :{ id: xxx, en: { name: "Name", description: "product details.." }, es: { name: "Name", description: "product details.." }, ... } 

just because you only need to search for one product after choosing any language.

0
source

How about this approach:

 product: { id: 1, name: 'Original Name', description: 'Original Description', price: 33, date: '2019-03-13', translations: { es: { name: 'Nombre Original', description: 'Descripción Original', } } } 

If the user selects a language other than the default language, and there are key translations in the object, you only need to combine it, and if any key does not have a translation, the original will remain.

Another advantage is that if you need to remove the translation function or add / remove any language, you just need to change or delete the translation key and not refactor the whole scheme.

0
source

I use the following template for a key and values ​​to be indexed in a key:

  { "id":"ObjectId", "key":"error1" "values":[{ "lang":"en", "value":"Error Message 1" }, { "lang":"fa", "value":"متن خطای شماره 1" }] } 

and use this code in c #

 object = coleccion.find({"key": "error1"}); 

view this link Model a one-to-many relationship with embedded documents!

0
source

Source: https://habr.com/ru/post/969687/


All Articles