SQL Server 2008 Full Text Search (FTS) Compared to Lucene.NET

I know that in the past there were questions about SQL 2005 compared to Lucene.NET, but since 2008 they came out and they made a lot of changes to it, and wondered if anyone could give me the pros / cons (or a link to article).

+41
sql-server sql-server-2008 full-text-search lucene
Jan 31 '09 at 18:02
source share
5 answers

In 2006, I built a medium-sized knowledge base (possibly 2 GB of indexed text) on top of SQL Server 2005 FTS and now migrated it to 2008 iFTS. Both situations worked well for me, but the transition from 2005 to 2008 was a good improvement for me.

My situation was not the same as StackOverflow in the sense that I indexed data that was updated only at night, however I tried to join the search results from several CONTAINSTABLE statements back to each other and to relational tables.

In FTS 2005, this meant that each CONTAINSTABLE would have to do its own index search, return complete results, and then cause the DB engine to combine these results with relational tables (all this was transparent to me, but it happened and was expensive for inquiries). 2008 iFTS has improved this situation because integration with the database allows several CONTAINSTABLE results to become part of the query plan, which has made many queries more efficient.

I think that both 2005 and 2008 FTS engines, as well as Lucene.NET, have architectural compromises that will better or worse correspond to many circumstances of the project. I was just lucky that the update worked in my favor. I can fully understand why iFTS in 2008 will not work in the same configuration as in 2005, for a high-level type of use of OLTP, for example StackOverflow.com. However, I would not reduce the likelihood that the 2008 iFTS might be isolated from heavy loading of the insert transaction ... but it also sounds like it could be as much as switching to Lucene.NET. .. and the cool Lucene.NET factor is hard to ignore;)

In any case, for me the ease and effectiveness of iFTS SQL 2008 in most situations probably eliminates the Lucene "cool" factor (although it is easy to use, I never used it in a production system, m comments on this). I would be interested to know how much more efficient Lucene (turned out to be implemented now?) In StackOverflow or similar situations.

+16
Feb 01 '09 at 1:32
source share

SQL Server FTS will be easier to manage for small deployments. Because FTS is integrated with the database, the RDBMS automatically updates the index. The console here is that you do not have an obvious scaling solution that does not match the replication database. Therefore, if you do not need to scale, SQL Server FTS is probably "safer." Politically, most stores will be more comfortable with a clean SQL Server solution.

On the Lucene side, I would endorse SOLR over the Lucene line. In any solution, you need to work more on updating the index when the data changes, as well as mapping the data to the SOLR / Lucene index. The upside is that you can easily scale by adding additional indexes. You can run these indexes on very meager linux servers, which eliminates some of the license overhead. If you take the Lucene / SOLR route, I would set the goal of putting ALL the data you need directly into the index, instead of putting the pointers back into the database in the index. You can include data in an index that is not searchable, so, for example, you could pre-create the HTML or XML stored in the index and serve it as a search result. With this approach, your database can be disabled, but you can still serve search results in a disabled mode.

I have never seen a benchmark comparison of performance between SQL Server 2008 and Lucene, but would like to see it.

+18
Jan 31 '09 at 19:48
source share

This may help: http://blog.stackoverflow.com/2008/11/sql-2008-full-text-search-problems/

I didnโ€™t use SQL Server 2008 personally, although based on this blog post, it seems that full-text search is slower than in 2005.

+5
Jan 31 '09 at 18:07
source share

We use full-text search capabilities, but, in my opinion, it depends on the data itself and your needs.

we scale using web servers, and therefore I like lucene because I don't have so much load on the sql server.

to start from scratch and in order to have a full-text search, I would prefer a sql server solution, because I think itโ€™s very fast to get the results if you want lucene you need to implement more at startup (and also get some know-how).

+4
Jan 31 '09 at 18:11
source share

One consideration you need to keep in mind is what kind of search restrictions you have in addition to the full-text restriction. If you comply with restrictions that lucene cannot provide, you will almost certainly want to use FTS. One of the nice things about 2008 was that they improved the integration of FTS with standard SQL servers, so performance should be better with a mixed database and FT restrictions than in 2005.

0
Feb 02 '09 at 23:40
source share



All Articles