As I understand the Solr scoring function , the following two queries should be equivalent.
Namely, score(q1, d) = score(q2, d) for each document d in the corpus.
Request 1: evolution OR selection OR germline OR dna OR rna OR mitochondria
Request 2: (evolution OR selection OR germline) OR (dna OR rna OR mitochondria)
Queries are obviously logically equivalent (they both return the same set of documents). In addition, both queries consist of the same 6 terms, and each term has one plus in both queries. Therefore, each member must have the same contribution to the total score (same TF, same IDF, same impulse).
Despite this, the requests do not give the same ratings .
In general, the combination of terms ( a OR b OR c OR d ) does not match the query conjunction ( (a OR b) OR (c OR d) ). What is the semantic difference between the two types of queries? What causes them different grades?
The reason I ask is because I create a user request handler in which I create a second type of request (connection of requests), while I may need to build the first type of request (connection of terms), in other words, this what i am doing:
Query q1 = ... //conjunction of terms evolution, selection, germline Query q2 = ... //conjunction of terms dna, rna, mitochondria Query conjunctionOfQueries = new BooleanQuery(); conjunctionOfQueries.add(q1, BooleanClause.Occure.SHOULD); conjunctionOfQueries.add(q2, BooleanClause.Occure.SHOULD);
although maybe I should do this:
List<String> terms = ... //extract all 6 terms from q1 and q2 List<TermQuery> termQueries = ... //create a new TermQuery from each term in terms Query conjunctionOfTerms = new BooleanQuery(); for (TermQuery t : termQueries) { conjunctionOfTerms.add(t, BooleanClause.Occure.SHOULD); }