Tips for implementing XQuery full-text search using Lucene

I used Lucene in a previous project, so I'm a little familiar with the API. However, I never had to do anything “fancy” (where “fantasy” means things like using filters, different analyzers, boosting, payload, etc.).

I am going to start implementing full-text XQuery search:

http://www.w3.org/TR/xpath-full-text-10/

Its query capabilities are the most complex that I have seen. In my experience with Lucene, I know that it can be used to implement some functions; however, I would like to go through all of them. For each function, I need a simple answer: "Feature X is best implemented using a query filter," so I start in the right direction for each function.

Note. I will implement my own query parser and build queries manually using various instances of the Lucene classes.

3.3 Power selection

This allows you to say things like:

title ftcontains "usability" occurs at least 2 times

which means that the header field should contain "usability" at least twice. How can I do that?

3.4.4 clipping option

, , :

title ftcontains "improve" with stemming

, "". , PorterStemFilter , , , , .

? ( )? ?

3.4.5

- - " ", " ", " ", " ".

, , , " " , ( " " ).

/ ? , : ( , , ). ?

3.4.6

Cast, . ?

3.4.7 Word

- qt - " -", :

abstract ftcontains "propagating of errors"
with stop words ("a", "the", "of")

, " ". , . - , ..:

"propagating of errors" -> "propagating * errors"

* . Lucene?

3.5.3

XQuery "": (), "". , :

body ftcontains "Mexico" not in "New Mexico"

, "", "-". , , ?

3.6.1

, , :

title ftcontains ("web site" ftand "usability") ordered

, "-" " " , " " - "-" . Lucene SpanQuery , ? ?

3.6.4

, "", :

abstract ftcontains "usability" ftand "web site" same sentence

{same | different} {sentence | paragraph}. / . ?

3.7

XQuery:

let $x := <book>
  <title>Web Usability and Practice</title>
  <author>Montana <annotation> this author is
      an expert in Web Usability</annotation> Marigold
  </author>
  <editor>Vera Tudor-Medina on Web <annotation> best
      editor on Web Usability</annotation> Usability
  </editor>
</book>

:

book ftcontains "Web Usability" without content $x//annotation

. "-" : title . , "-". , , . ?


, , . !

+3
1

Lux, GitHub: https://github.com/msokolov/lux. Saxon XQuery Lucene/Solr, XQuery. , , , , Lucene , XQuery. , xqft . Lux : ( ) , node ( ). Lucene.

: , 3.3 SpanNearQuery .

3,4, 3,5, 3,6 3,7: (, , ..) : , , . , , , Lucene - , , -.

- , 2 . , !

+1

Source: https://habr.com/ru/post/1728496/


All Articles