I'm working on creating a custom ispell dictionary configuration for Postgresql 8.4, and I'm having some problems getting words with apostrophes in them for proper analysis. The ispell dictionary included in Postgresql includes .affix files that contain the SFX SF rule, which indicates the extended form of its word.
Here is an example assuming what I have dictionary/SMin my dictionary:
SELECT to_tsvector('english_ispell', 'dictionary' dictionaries');
Expected Result:
'dictionary':1,2
Actual conclusion:
s':2, 'dictionary':1,3
Am I doing something wrong? Here is the result of ts_debug to show how to parse it.
SELECT * FROM ts_debug('english_ispell', 'dictionary' dictionaries');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+--------------+-------------------------------+----------------+--------------
asciiword | Word, all ASCII | dictionary | {english_ispell,english_stem} | english_ispell | {dictionary}
blank | Space symbols | ' | {} | |
asciiword | Word, all ASCII | s | {english_ispell,english_stem} | english_ispell | {s}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | dictionaries | {english_ispell,english_stem} | english_ispell | {dictionary}
How to get Postgresql for parsing 'as part of a single word and not break it as a "space symbol"?
source
share