Postgresql: how to do a full text search ignore specific tokens?

is there a magic function or operator to ignore some tokens?

select to_tsvector('the quick. brown fox') @@ 'brown' -- returns true

select to_tsvector('the quick,brown fox') @@ 'brown' -- returns true

select to_tsvector('the quick.brown fox') @@ 'brown' -- returns false, should return true

select to_tsvector('the quick/brown fox') @@ 'brown' -- returns false, should return true
+3
source share
1 answer

I'm afraid you are probably stuck. If you run your terms through ts_debug, you will see that “quick.brown” is parsed as the host name, and “fast / brown” is parsed as the path to the file system. The parser is really not that smart.

My only assumption is that you pre-process your texts to convert these tokens into spaces. You can easily create a function in plpgsql to do this.

nicg=# select ts_debug('the quick.brown fox');
                              ts_debug
---------------------------------------------------------------------
 (asciiword,"Word, all ASCII",the,{english_stem},english_stem,{})
 (blank,"Space symbols"," ",{},,)
 (host,Host,quick.brown,{simple},simple,{quick.brown})
 (blank,"Space symbols"," ",{},,)
 (asciiword,"Word, all ASCII",fox,{english_stem},english_stem,{fox})
(5 rows)

,

+4

Source: https://habr.com/ru/post/1708874/


All Articles