I used Lucene in a previous project, so I'm a little familiar with the API. However, I never had to do anything “fancy” (where “fantasy” means things like using filters, different analyzers, boosting, payload, etc.).
I am going to start implementing full-text XQuery search:
http://www.w3.org/TR/xpath-full-text-10/
Its query capabilities are the most complex that I have seen. In my experience with Lucene, I know that it can be used to implement some functions; however, I would like to go through all of them. For each function, I need a simple answer: "Feature X is best implemented using a query filter," so I start in the right direction for each function.
Note. I will implement my own query parser and build queries manually using various instances of the Lucene classes.
3.3 Power selection
This allows you to say things like:
title ftcontains "usability" occurs at least 2 times
which means that the header field should contain "usability" at least twice. How can I do that?
3.4.4 clipping option
, , :
title ftcontains "improve" with stemming
, "". , PorterStemFilter , , , , .
? ( )? ?
3.4.5
- - " ", " ", " ", " ".
, , , " " , ( " " ).
/ ? , : ( , , ). ?
3.4.6
Cast, . ?
3.4.7 Word
- qt - " -", :
abstract ftcontains "propagating of errors"
with stop words ("a", "the", "of")
, " ". , . - , ..:
"propagating of errors" -> "propagating * errors"
* . Lucene?
3.5.3
XQuery "": (), "". , :
body ftcontains "Mexico" not in "New Mexico"
, "", "-". , , ?
3.6.1
, , :
title ftcontains ("web site" ftand "usability") ordered
, "-" " " , " " - "-" . Lucene SpanQuery , ? ?
3.6.4
, "", :
abstract ftcontains "usability" ftand "web site" same sentence
{same | different} {sentence | paragraph}. / . ?
3.7
XQuery:
let $x := <book>
<title>Web Usability and Practice</title>
<author>Montana <annotation> this author is
an expert in Web Usability</annotation> Marigold
</author>
<editor>Vera Tudor-Medina on Web <annotation> best
editor on Web Usability</annotation> Usability
</editor>
</book>
:
book ftcontains "Web Usability" without content $x//annotation
. "-" : title . , "-". , , . ?
, , . !