Regular expressions in MarkLogic xQuery

I am trying to use XQuery with fn:matches with a regex, but MarkLogic's XQuery implementation does not seem to allow for hexadecimal characters. Following is the error < Invalid regular expression .

 (: Find text containing non-ISO-Latin characters :) let $regex := '[^\x00-\xFF]' let $results := fn:collection('mydocs')//myns:myelem[fn:matches(., $regex)] let $count := fn:count($results) return <figures count="{$count}"> { $results } </figures> 

However, this does not give an error.

 let $regex := '[^a-zA-Z0-9]' let $results := fn:collection('mydocs')//myns:myelem[fn:matches(., $regex)] let $count := fn:count($results) return <figures count="{$count}"> { $results } </figures> 

Is there a way to use hexadecimal representation or an alternative that will give me the same result in the implementation of MarkLogic XQuery?

+6
source share
2 answers

XQuery can use numeric character references in strings, much in the same way that XML and HTML can:

decimal: "&#10;" hex: "&#0a;" (or just "&#a;" )

However, you cannot represent multiple characters: <= "&#x09;" , eg.

There is no regex type in XQuery (you just use a string as a regex), so you can use character references in your regexes:

 fn:matches("a", "[^&#x09;-&#xFF;]") (: => xs:boolean("false") :) 

Update : here is the XQuery 1.0 specification for character references: http://www.w3.org/TR/xquery/#dt-character-reference .

Based on some brief testing, I think MarkLogic applies XML 1.1 character reference rules: http://www.w3.org/TR/xml11/#charsets

For posterity, here are the XML 1.0 rules: http://www.w3.org/TR/REC-xml/#charsets

+4
source

Well, it seems that the implementation of MarkLogic xQuery requires Unicode. As it turned out, even very small ranges in hexadecimal (for example, [^x00-x0F] ) threw the error "Invalid regular expression", but the Unicode notation did not give an error. The following are the results.

 let $regex := '[^U0000-U00FF]' let $results := fn:collection('mydocs')//myns:myelem[fn:matches(., $regex)] let $count := fn:count($results) return <figures count="{$count}"> { $results } </figures> 

I think that the simple assignment let $regex := '[^\x00-\xFF]' did not produce an error, because it was treated as a string when trying to return $regex .

+1
source

Source: https://habr.com/ru/post/986379/


All Articles