Using Lucene to query an RDBMS database

I looked through the documents for the Java version of Lucene, but so far I can’t see information on how this works at the highest level (I know that I need RTFM, I just can "I see a tree for trees".

I understand that Lucene uses search indexes to return results. As far as I know, it only returns “hits” from these indices. If I did not add the data item when creating the index, it will not be returned.

This is good, so now I want to check the following assumption:

Q: Does this mean that any data that I want to display on the search page needs to be added to the Lucene index?

those.
If I want to search for Product with things like sku, description, category name, etc., but I also want to display Customer that they belong in the search results, do the following:

  • Make sure the Lucene index has the denormalized Customer name in the index.
  • Use Lucene images to somehow query the database for the actual product records and use the JOIN to get the Customer name.

I assume this is option 1 , as I assume that there is no way to “attach” the Lucene query results to an RDBMS, but I wanted to ask that my general usage assumptions are correct.

+4
source share
3 answers

Based on BrokenGlass's answer , I thought about this more and suggest the following to see if I am on the line correctly:

In principle, taking the second option, you can do the following:

  • Put only the data you want to find in the Lucene index, plus some key value (for example, PK tables in your database).
  • Lucene query for a hit list.
  • Using the data access level of your choice, create a query for your database that contains the predicate IN (value [, value]) .
  • Get the results for this query from your database (which may well include JOIN in other tables).
  • Put these results into the dictionary using the result set PK as the key.
  • Repeat Lucene's repetitions in order, pulling items from the dictionary using PK so you can build the list of results in the order in which Lucene returned the hits (i.e. sorted by relevance).
  • Display that "sorted" list of results for the user.

Of course, steps 5 and 6 may be better, but for the sake of explanation I have put this detailed method in my description. If Lucene's hits include some sort of “relevance” value, you can attribute this to the result set and do standard sorting, but this is an exercise for the reader. :)

Could it be?

0
source

Typically, the index will contain only the fields that you want to search, and not those that you want to display. Indexes should be optimized as much as possible in order to maintain search efficiency.

To be able to display more data, add a field to your index that allows you to get the full document / data, i.e. unique key for your Product (product identifier?).

+1
source

I tried to figure out the same problem, but I think it works too much. I think of it as an alternative. Place correct me if I am mistaken in my thoughts!

Your situation is as follows: RDBMS product (many) <------> (many) Client

Instead of only entering the client in the lucene index to get the product keys, and then query RDBMS using IN Query, I would suggest creating a lucene index with the Cartesian product Product, as well as with the client.

Like customer_1, product_1 customer_1, product_2 customer_2, product_2 ..

Thus, when you search for a product in lucene, it will provide both a client and a product identifier .. and instead of joining them in an RDBMS, you can simply search for such customers, as well as products for more information from the RDBMS, if there is a need. If you use caching, then the additional cost of the search will also be reduced.

+1
source

Source: https://habr.com/ru/post/1335858/


All Articles