I want to score similar documents in Lucene. Let me explain my scenario to you.
For example, let's say I have the following entries in my file on which I created the index.
ID | First Name | Last Name | DOB
1 | John | Doe | 03/18/1990
1 | John | Twain | 03/18/1990
3 | Joey | Johnson | 05/14/1978
3 | Joey | Johnson | 05/14/1987
4 | Joey | Johnson | 05/14/1987
When I search for "John Doe"
The created search index will display entries in the following order
ID | First Name | Last Name | DOB
1 | John | Doe | 03/18/1990
3 | Joey | Johnson | 05/14/1978
3 | Joey | Johnson | 05/14/1987
4 | Joey | Johnson | 05/14/1987
1 | John | Twain | 03/18/1990
2 | Daniel | Doe | 03/25/1989
As you can see, Lucene displays the entries according to the conditions I was looking for, but not according to the similarity between the entries. I want him to search for records with the conditions provided, but display them based on their similarity.
What I want
ID | First Name | Last Name | DOB
1 | John | Doe | 03/18/1990
1 | John | Twain | 03/18/1990
3 | Joey | Johnson | 05/14/1978
3 | Joey | Johnson | 05/14/1987
4 | Joey | Johnson | 05/14/1987
2 | Daniel | Doe | 03/25/1989
Here, the John Twain and John Doe entries are displayed together because they are both similar, and one of them has the maximum match in the user request.
You understand me?
Search code.
String sa=textbox1.Text; // Assume this value to be John Doe in this case. String[] searchfield= new string[] { "ID", "First Name", "Last Name","DOB"}; IndexReader reader = IndexReader.Open(dir, true); TopScoreDocCollector coll = TopScoreDocCollector.Create(50, true); indexSearcher.Search(QueryMaker(sa, searchfield), coll); ScoreDoc[] hits = coll.TopDocs().ScoreDocs; for (int i = 0; i < hits.Length; i++) { SearchResults result = new SearchResults(); int docID = hits[i].Doc; Document d = indexSearcher.Doc(docID); result.fname=d.Get("First Name").ToString(); }
Method Attempt:
I tried to use the MoreLikeThis class, but not sure if I am doing it right or even if it is the right method. Also, how to use the Like method for two or more docid? Also, if you use docid, it will display a duplicate of the document, because I am reading the same reader
Code:
IndexSearcher mltsearcher = new IndexSearcher(reader); MoreLikeThis mlt = new MoreLikeThis(reader); int docid =hits[1].Doc; Query query = mlt.Like(docid); TopDocs similardocs = mltsearcher.Search(query, 10);
Please let me know if you have any questions.
Iβve been trying to learn Lutsene in the last two weeks, so I donβt know much.
Note. I am using Lucene.Net 3.0.3