Match a phrase ending with a prefix to full-text search

I am looking for a way to emulate something like SELECT * FROM table WHERE attr LIKE '%text%' using tsvector in PostgreSQL.

I created the tsvector attribute without using a dictionary. Now the request is like ...

 SELECT title FROM table WHERE title_tsv @@ plainto_tsquery('ph:*'); 

... will return all names, such as "Physics", "PHP", etc. But how can I create a query that returns all records where the title starts with "Zend Fram" (which should return, for example, "Zend Framework" ")?

Of course, I could use something like:

 SELECT title FROM table WHERE title_tsv @@ to_tsquery('zend') AND title_tsv @@ to_tsquery('fram:*'); 

However, this seems a bit uncomfortable.

So the question is: is there a way to formulate the above query using something like:

 SELECT title FROM table WHERE title_tsv @@ to_tsquery('zend fram:*'); 
+6
source share
4 answers
 SELECT title FROM table WHERE title_tsv @@ to_tsquery('zend') and title_tsv @@ to_tsquery('fram:*') 

is equivalent to:

 SELECT title FROM table WHERE title_tsv @@ to_tsquery('zend & fram:*') 

but, of course, finds that "Zend also does not have a framework."

Of course, you could express the regex match against the header after the tsquery match, but you will have to use an analysis of the explanations to make sure it runs after the tsquery, and not earlier.

+5
source

There is a way to do this in Postgres using trigrams and Gin / Gist indices. Here is a simple example, but with some rough edges, in this article by Cristo Quive: Subscript Search .

+2
source

Postgres 9.6 introduces full-text phrase search capabilities. So now this works:

 SELECT title FROM tbl WHERE title_tsv @@ to_tsquery('zend <-> fram:*'); 

<-> is the FOLLOWED BY statement.

It finds a "foo Zend framework bar" or "Zend frames", but not 'foo Zend has no frame.

Quote from the release note for Postgres 9.6:

A phrase search query can be specified in the tsquery input using the new <-> and < N > operators. The first means that tokens are before and after it should appear next to each other in that order. The latter means that they must be exactly N tokens.

For better performance support, a query with a GIN index:

 CREATE INDEX tbl_title_tsv_idx ON tbl USING GIN (title_tsv); 

Or do not store title_tsv in a table at all (bloating and writing difficulty). Instead, you can use the expression index:

 CREATE INDEX tbl_title_tsv_idx ON tbl USING GIN (to_tsvector('english', title)); 

You need to specify the text search configuration (often language dependent) to make the expression unchanged. And adapt the request accordingly:

 ... WHERE to_tsvector('english', title) @@ to_tsquery('english', 'zend <-> fram:*'); 
+2
source

This is not a good solution, but it should fulfill this task:

 psql=# SELECT regexp_replace(cast(plainto_tsquery('Zend Fram') as text), E'(\'\\w+\')', E'\\1:*', 'g') ; regexp_replace --------------------- 'zend':* & 'fram':* (1 row) 

It can be used as:

 psql=# SELECT title FROM table WHERE title_tsv(title) @@ to_tsquery(regexp_replace(cast(plainto_tsquery('Zend Fram') as text), E'(\'\\w+\')', E'\\1:*', 'g')); 

How it works:

  • translates a simple tsquery into a string: cast(plainto_tsquery('Zend Fram') as text)
  • uses regex to add a subscript prefix :* to each search query: regexp_replace(..., E'(\'\\w+\')', E'\\1:*', 'g')
  • converts it back to unoccupied tsquery. to_tsquery(...)
  • and uses it in the search expression SELECT title FROM table WHERE title_tsv(title) @@ ...
+1
source

Source: https://habr.com/ru/post/889225/


All Articles