Suppose the documents in our collection look like this:
{ "phoneme" : "JH OY1 NZ" } { "phoneme" : "foobar" }
In version 3.4+, we can use the $split
operator to split the field value into an array of substrings.
To split a string into an array of characters, we need to apply the expression $substrCP
to the array of all characters in string using the $map
operator.
To get an array of index values, all integers from 0 to the length of the string minus one that can be generated using $range
and the $strLenCP
operators.
We use the $addFields
pipeline step to add new fields to the source document, but for this to be permanent, we can either create a view or overwrite our collection using the $out
"→ pipeline aggregation operator.
[ { "$addFields":{ "arrayOfPhonemeChar":{ "$map":{ "input":{ "$range":[ 0, { "$strLenCP":"$phoneme" } ] }, "in":{ "$substrCP":[ "$phoneme", "$$this", 1 ] } } }, "phonemeSubstrArray":{ "$split":[ "$phoneme", " " ] } } } ]
gives what looks like this:
{ "phoneme" : "JH OY1 NZ", "arrayOfPhonemeChar" : ["J", "H", " ", "O", "Y", "1", " ", "N", " ", "Z"], "phonemeSubstrArray" : ["JH", "OY1", "N", "Z"] }, { "phoneme" : "foobar", "arrayOfPhonemeChar" : ["f", "o", "o", "b", "a", "r"], "phonemeSubstrArray" : ["foobar"] }