I am working on a site with minimal traffic at the moment. It is built using Ruby on Rails and runs on the Heroku cloud platform.
As part of the site, I have a large number of pages that should be searchable, each of which contains only a small amount of information. Think of a table of articles, where each article only needs to index the title, but there are about 8 million articles.
Postgres Search : When I first started working on this, I ran a full-text Postgres search, but apparently it is not optimized enough for the search to handle many indexed items, as well as the dog’s slow performance. I had some searches that synchronized the database connection and took more than 30 seconds.
Websolr : Then I moved on to what was then Heroku's only and only cloud search add-on, Websolr from OneMoreCloud. Unfortunately, they charge by the number of indexed products, which is terrible for a site like me that doesn’t have traffic, but a lot of items for indexing, and I had performance that might have been worse than searching for Postgres, which was free. If a Postgres search delays the timeout and results in the site being deleted, Websolr will return an empty or partial set of results, causing viewers to think that the result was not in the database.
Index Tank : Heroku has now added another cloud search provider, Index Tank, which is still in beta. Despite the fact that the beta version is free, I do not want to try them, because for their non-Heroku service, which is not free, their highest plan has only 2 million documents, although it costs $ 500 per month.
Google Site Search: The option I'm currently viewing is switching to Google Site Search. The Google trademark gives me confidence that I will not run into performance issues that I had in the past. In addition, their price is extremely reasonable and is estimated by traffic. However, on the other hand, this is not a truly integrated search, because it does not connect to the database, but only browses the web, so I can’t specify where to search, where it returns, say, articles in the category “Technical articles” or something like that. Even in order to customize the look of the search results, it seems like a pain because I have to parse the search results in XML and then use them to create my search results page, and if I wanted to customize the metadata on the display I would have to use parsed search results to find all rows of results in my database.
Are there any good options for cloud or third-party search providers that you recommend to the Stackoverflow community?