Prevent creating your own nouns in PostgreSQL?

In its enthusiasm for stem tokens in tokens , the PostgreSQL Full Text search engine also reduces proper names. For instance:

essais=> select to_tsquery('english', 'bortzmeyer'); to_tsquery ------------ 'bortzmey' essais=> select to_tsquery('english', 'balling'); to_tsquery ------------ 'ball' (1 row) 

At least for the first, I'm sure this is not in the English dictionary! What is the best way to avoid this side outpouring?

+2
source share
2 answers

The task of generation algorithms is not to reduce each word to its correct stem; the goal is to reduce words that look like a general form. The goal, as a rule, is not to get a word that can be presented to the user: even if "balling" and "ball" will produce "kjebnkkekaa", the algorithm is correct, because it still sees "balling" and "ball", as usual regarding the same.

Also be careful that the downward movement algorithm is absolutely not perfect; for more information, find the Porter-Stemming algorithm

+4
source

This is due to the stem of the Snowball, as described here . In principle, you will want to disable the Snowball trunk and use only iSpell or one of the other dictionaries, but this will also reduce the stem's effectiveness for words that are not in the dictionaries.

+2
source

Source: https://habr.com/ru/post/1439481/


All Articles