How to find similar documents in Lucene?

I want to score similar documents in Lucene. Let me explain my scenario to you.

For example, let's say I have the following entries in my file on which I created the index.

  ID | First Name | Last Name | DOB
 1 | John | Doe | 03/18/1990
 1 | John | Twain | 03/18/1990
 3 | Joey | Johnson | 05/14/1978
 3 | Joey | Johnson | 05/14/1987
 4 | Joey | Johnson | 05/14/1987 

When I search for "John Doe"

The created search index will display entries in the following order

  ID | First Name | Last Name | DOB
 1 | John | Doe | 03/18/1990
 3 | Joey | Johnson | 05/14/1978
 3 | Joey | Johnson | 05/14/1987
 4 | Joey | Johnson | 05/14/1987
 1 | John | Twain | 03/18/1990 
 2 | Daniel | Doe | 03/25/1989

As you can see, Lucene displays the entries according to the conditions I was looking for, but not according to the similarity between the entries. I want him to search for records with the conditions provided, but display them based on their similarity.

What I want

  ID | First Name | Last Name | DOB
 1 | John | Doe | 03/18/1990
 1 | John | Twain | 03/18/1990 
 3 | Joey | Johnson | 05/14/1978
 3 | Joey | Johnson | 05/14/1987
 4 | Joey | Johnson | 05/14/1987
 2 | Daniel | Doe | 03/25/1989

Here, the John Twain and John Doe entries are displayed together because they are both similar, and one of them has the maximum match in the user request.

You understand me?

Search code.

String sa=textbox1.Text; // Assume this value to be John Doe in this case. String[] searchfield= new string[] { "ID", "First Name", "Last Name","DOB"}; IndexReader reader = IndexReader.Open(dir, true); TopScoreDocCollector coll = TopScoreDocCollector.Create(50, true); indexSearcher.Search(QueryMaker(sa, searchfield), coll); ScoreDoc[] hits = coll.TopDocs().ScoreDocs; for (int i = 0; i < hits.Length; i++) { SearchResults result = new SearchResults(); int docID = hits[i].Doc; Document d = indexSearcher.Doc(docID); result.fname=d.Get("First Name").ToString(); } 

Method Attempt:

I tried to use the MoreLikeThis class, but not sure if I am doing it right or even if it is the right method. Also, how to use the Like method for two or more docid? Also, if you use docid, it will display a duplicate of the document, because I am reading the same reader

Code:

 IndexSearcher mltsearcher = new IndexSearcher(reader); MoreLikeThis mlt = new MoreLikeThis(reader); int docid =hits[1].Doc; Query query = mlt.Like(docid); TopDocs similardocs = mltsearcher.Search(query, 10); 

Please let me know if you have any questions.

I’ve been trying to learn Lutsene in the last two weeks, so I don’t know much.

Note. I am using Lucene.Net 3.0.3

+4
source share
1 answer

Can you show the code for the QueryMaker() method?

I think you can create a new "name" field, which consists of the first name and last name, and you can use FuzzyQuery to search in the new field. FuzzyQuery are counting documents according to line spacing in levenshtein.

+2
source

Source: https://habr.com/ru/post/1485861/


All Articles