The links are dead both from the question and from the original answer, but there is a way to define a scheme for this, which is supported in modern versions.
The recommended way would be to include the "language" property in the document or attached documents next to the property used for the text index. The term “near” means “at the same level” and is not located immediately next to an object in the index.
Something in common would look like this:
{ "description": "Texto largo en español", "language": "spanish", "translation": [ { "description": "Large text in Spanish", "language": "english" }, { "description": "Grand texte en espagnol", "language": "french" } ] }, { "description": "The quick brown fox", "translation": [ { "description": "Le renard brun rapide", "language": : "french" } ] }
And then, assuming we are using the "index" text index text "english", we can simply index:
db.collection.createIndex({ "description": "text", "translation.description": "text" })
MongoDB will then use the "language" property as shown in the "root" document or from the "embedded documents" in the array, and where it is omitted, it will simply use the default value defined for the index. For example, the second document here does not have a language property in "root", so "english" is considered, since it is the default value for the index.
Indexed elements do not have to be in any order, which is also confirmed by the presence of an "english" entry inside the "translations" array with embedded documents based on the first sample document. The rules for inline elements are slightly different in that we must include the "language" properties in the inline documents or in the actual language used with that of the "root" document. In this example, any embedded document in an array without the "language" property will be considered using "spanish" since this is what is defined in the "root".
Of course, the search is performed taking into account all the paths present in the index, as well as for the "description" properties and the built-in "translation.description" properties as defined here. The corresponding “search language” is still always used, as indicated with the $language option for the $text operator, since the words “stop words” and “interpretation” are still considered in connection with this and the default index language set when creating index.
The built-in format also gives you an easy point with which you can get information about the language for the "translation" between the two languages in which you have content specific to both of the languages in question, so the practicality is "doubled" in this case.
Currently, specific documentation is located in the Create a Text Index for a Collection in Multiple Languages section as a section in the wider section “ Specifying a Language for a Text Index”, which includes links to all other details, including specifying a different default language for the index.