Any reason not using PostgreSQL's built-in full-text search on Heroku?

I am going to deploy a Rails application on Heroku that requires a full-text search. So far I have been running it on VPS using MySQL with Sphinx.

However, if I want to use Sphinx or Solr on Heroku, I will have to pay for the add-on.

I notice that PostgreSQL (the database used on Heroku) has a built-in full-text search function.

Is there a reason I could not use Postgres full-text search? Is it slower than Sphinx, or are there some other major limitations?

+49
postgresql full-text-search heroku solr sphinx
Jun 04 2018-12-12T00:
source share
5 answers

Edit, 2016 - Why not both?

If you are interested in Postgres vs. Lucene, why not? Check out the ZomboDB extension for Postgres, which integrates Elasticsearch as a first-class index type. Still a rather early project, but for me it looks very promising.

(Technically not available on Heroku, but still worth a look.)




Disclosure: I am a co-founder of Websolr and Bonsai Heroku Supplements, so my perspective is a bit biased towards Lucene.

In the full-text Postgres search, I read that it is quite simple for simple use cases, but there are a number of reasons why Lucene (and therefore Solr and ElasticSearch) is superior in both performance and functionality.

For starters, jpountz provides a truly excellent technical answer to the question, Why is Solr so much faster than Postgres? It costs a few reads to really digest.

I also commented on a recent RailsCast episode comparing the relative advantages and disadvantages of Postgres full-text search versus Solr. Let me remind you that here:

Pragmatic Benefits for Postgres

  • Reuse an existing service that you already use, instead of setting up and maintaining (or paying) for something else.
  • Far superior to the fantastically slow SQL LIKE .
  • It’s less hassle to keep data in sync, since they are all in the same database - there is no integration at the application level with any external data services API.

Benefits for Solr (or ElasticSearch)

On top of my head, in a certain order ...

  • Scale your indexing and search load separately from regular database loading.
  • More flexible analysis of terms for such things as normalization of normalization, linguistic generation, N-grams, removal of markup ... Other interesting functions, such as spell checking, "rich content" (for example, PDF and Word) ...
  • Solr / Lucene can do just about anything, a full-text search for the Postgres TODO list .
  • Significantly better and faster ranking of the relevance of terms, effectively customizable during the search.
  • Probably faster search performance for general terms or complex queries.
  • Probably more efficient indexing performance than Postgres.
  • Better portability of changes to your data model by untying indexing from your primary data warehouse

Obviously, I think the best option here is a specialized search engine based on Lucene. Basically, you can think of Lucene as an actual open source repository of search experience.

But if your only option is the LIKE operator, then Postgres full-text search is a definite victory.

+55
Jun 04 2018-12-12T00:
source share

Since I just tried to compare the search results (1.9) with the postgres FTS, I decided that I should share my results, since they are somewhat closer than @gustavodiazjaimes quotes.

My main concern for postgres was that it did not have facets, but it is trivial to build itself, here is my example (in django):

 results = YourModel.objects.filter(vector_search=query) facets = (results .values('book') .annotate(total=Count('book')) .order_by('book')) 

I am using postgres 9.6 and elastic-search 1.9 (via haystack on django). Here is a comparison between elasticsearch and postgres on 16 different types of queries.

  es_times pg_times es_times_faceted pg_times_faceted 0 0.065972 0.000543 0.015538 0.037876 1 0.000292 0.000233 0.005865 0.007130 2 0.000257 0.000229 0.005203 0.002168 3 0.000247 0.000161 0.003052 0.001299 4 0.000276 0.000150 0.002647 0.001167 5 0.000245 0.000151 0.005098 0.001512 6 0.000251 0.000155 0.005317 0.002550 7 0.000331 0.000163 0.005635 0.002202 8 0.000268 0.000168 0.006469 0.002408 9 0.000290 0.000236 0.006167 0.002398 10 0.000364 0.000224 0.005755 0.001846 11 0.000264 0.000182 0.005153 0.001667 12 0.000287 0.000153 0.010218 0.001769 13 0.000264 0.000231 0.005309 0.001586 14 0.000257 0.000195 0.004813 0.001562 15 0.000248 0.000174 0.032146 0.002246 count mean std min 25% 50% 75% max es_times 16.0 0.004382 0.016424 0.000245 0.000255 0.000266 0.000291 0.065972 pg_times 16.0 0.000209 0.000095 0.000150 0.000160 0.000178 0.000229 0.000543 es_times_faceted 16.0 0.007774 0.007150 0.002647 0.005139 0.005476 0.006242 0.032146 pg_times_faceted 16.0 0.004462 0.009015 0.001167 0.001580 0.002007 0.002400 0.037876 

To get postgres for these speeds for faceted searches, I had to use the GIN index in the field with SearchVectorField, which is specific to Django, but I am sure that other frameworks have a similar vector type.

Another consideration is that pg 9.6 now supports a matching phrase, which is huge.

My damage is that postgres will be preferred for most cases, as it suggests:

  • simpler stack
  • no dependency search is required for backend api wrapper s (thinking-sphinx, django-sphinx, haystack, etc.). This can be drag and drop, as they may not support the functions that your back-end performs (e.g. haystack torch / aggregates).
  • has similar performance and features (for my needs)
+17
04 Oct '16 at 14:50
source share

I found this amazing comparison and want to share it:

Full-text search in PostgreSQL

Index creation time LIKE predicate - no
PostgreSQL / GIN - 40 min
Sphinx Search - 6 min
Apache Lucene - 9 min
Inverted Index - High

LIKE predicate index storage - no
PostgreSQL / GIN - 532 MB
Sphinx Search - 533 MB
Apache Lucene - 1071 MB
Inverted Index - 101 MB

LIKE predicate query speed - more than 90 seconds

PostgreSQL / GIN - 20 ms
Sphynx Search - 8 ms
Apache Lucene - 80 ms
Inverted Index - 40 ms

+16
May 20 '13 at 4:06
source share

Full-text search Postgres has amazing capabilities in the areas of creating, ranking / increasing, processing synonyms, fuzzy searches among others, but without support for grant search.

So, if Postgres is already on your stack and you don’t need a line, better try to take advantage of the huge advantage of the ease of storing indexes while synchronizing and maintaining a smooth stack, before looking for Lucene-based solutions β€” if your entire application is not search-based.

+2
Jul 13 '15 at 4:54
source share

The Postgresql FTS feature is mature and fairly quick to search. It’s worth it to make sure.

0
Jun 04 '12 at 3:50
source share



All Articles