Question
I need to check if each word of the string is spelled correctly by searching the mongoDB collection for each word.
- Performing a minimum number of database queries
- The first word of each sentence must be in upper case, but this word can be upper or lower case in the dictionary. Therefore, for each word, I need a case-sensitive register. Only the first word of each sentence should be an argument to .
Line example
This is a simple example. Example. This is another example.
Dictionary Structure
Suppose there is a collection of dictionaries like this
{ word: 'this' }, { word: 'is' }, { word: 'a' }, { word: 'example' }, { word: 'Name' }
In my case, there are 100,000 words in this dictionary. Of course, names are stored in upper case, verbs are stored in lower case, etc ...
Expected Result
The words simple
and another
should be recognized as a misspelled word because they do not exist in the database.
An array with all existing words should be in this case: ['This', 'is', 'a', 'example']
. This
is uppercase since it is the first word of a sentence; in the database is stored as lowercase This
.
My attempt so far (updated)
const sentences = string.replace(/([.?!])\s*(?= [AZ])/g, '$1|').split('|'); let search = [], words = [], existing, missing; sentences.forEach(sentence => { const w = sentence.trim().replace(/[^a-zA-Z0-9äöüÄÖÜß ]/gi, '').split(' '); w.forEach((word, index) => { const regex = new RegExp(['^', word, '$'].join(''), index === 0 ? 'i' : ''); search.push(regex); words.push(word); }); }); existing = Dictionary.find({ word: { $in: search } }).map(obj => obj.word); missing = _.difference(words, existing);
Problem
- Insensitive matches do not work as expected:
/^Example$/i
will give me the result. But in existing
original lowercase example
will appear, which means that example
will go to missing
-Array. Thus, case-insensitive searches work as expected, but there are missmatch in the result arrays. I do not know how to solve this. - Is code optimization possible? Since I use two
forEach
-loops and difference
...