Grouping and Counting in Xquery

Hear XML. I am trying to get the number of titles published by the author in the date range from 02/15/2012 to 02/24/2012 . from highest to lowest (number of titles).

<entries> <entry> <id>1</id> <published>23/02/2012</published> <title>Title 1</title> <content type="html">This is title one</content> <author> <name>Pankaj</name> </author> </entry> <entry> <id>2</id> <published>22/02/2012</published> <title>Title 2</title> <content type="html">This is title two</content> <author> <name>Pankaj</name> </author> </entry> <entry> <id>3</id> <published>21/02/2012</published> <title>Title 3</title> <content type="html">This is title three</content> <author> <name>Rob</name> </author> </entry> <entry> <id>4</id> <published>20/02/2012</published> <title>Title 4</title> <content type="html">This is title four</content> <author> <name>Bob</name> </author> </entry> <entry> <id>5</id> <published>19/02/2012</published> <title>Title 1</title> <content type="html">This is title five</content> <author> <name>Pankaj</name> </author> </entry> 

I am trying to get output from xquery:

 <?xml version="1.0" encoding="UTF-8"?> <results> <result> <author> <name>Pankaj</name> </author> <numberOfTitles>3</numberOfTitles> </result> <result> <author> <name>Rob</name> </author> <numberOfTitles>1</numberOfTitles> </result> <result> <author> <name>Bob</name> </author> <numberOfTitles>1</numberOfTitles> </result> 

Please help me..

+6
source share
6 answers

This XQuery 1.0 solution is executable by any compatible XQuery 1.0 processor :

Note : No group by and no distinct-values() .

 <results> { let $entries := /*/entry [for $d in xs:date(string-join(reverse(tokenize(published, '/')), '-')) return xs:date('2012-02-15') le $d and $d le xs:date('2012-02-24') ], $vals := $entries/author/name return for $a in $vals[index-of($vals, .)[1]], $cnt in count(index-of($vals, $a)) order by $cnt descending return <result> <author> {$a} </author> <numberOfTitles> {count(index-of($vals, $a))} </numberOfTitles> </result> } </results> 

when applied to the provided XML document :

 <entries> <entry> <id>1</id> <published>23/02/2012</published> <title>Title 1</title> <content type="html">This is title one</content> <author> <name>Pankaj</name> </author> </entry> <entry> <id>2</id> <published>22/02/2012</published> <title>Title 2</title> <content type="html">This is title two</content> <author> <name>Pankaj</name> </author> </entry> <entry> <id>3</id> <published>21/02/2012</published> <title>Title 3</title> <content type="html">This is title three</content> <author> <name>Rob</name> </author> </entry> <entry> <id>4</id> <published>20/02/2012</published> <title>Title 4</title> <content type="html">This is title four</content> <author> <name>Bob</name> </author> </entry> <entry> <id>5</id> <published>19/02/2012</published> <title>Title 1</title> <content type="html">This is title five</content> <author> <name>Pankaj</name> </author> </entry> </entries> 

creates the desired, correct result :

 <?xml version="1.0" encoding="UTF-8"?> <results> <result> <author> <name>Pankaj</name> </author> <numberOfTitles>3</numberOfTitles> </result> <result> <author> <name>Rob</name> </author> <numberOfTitles>1</numberOfTitles> </result> <result> <author> <name>Bob</name> </author> <numberOfTitles>1</numberOfTitles> </result> </results> 
+4
source

Here is my solution:

 <results>{ for $entry in //entry let $date := xs:date(string-join(reverse(tokenize($entry/published, '/')), '-')), $author := $entry/author/string() where xs:date('2012-02-15') le $date and $date le xs:date('2012-02-24') group by $author order by count($entry) descending return <result>{ <author> <name>{$author}</name> </author>, <numberOfTitles>{count($entry)}</numberOfTitles> }</result> }</results> 

When BaseX is executed, it gives the correct result.

It uses XQuery 3.0 features like group by , otherwise it would be more complicated. I do not know if MarkLogic supports this.

+4
source

Here is a MarkLogic-specific solution using maps to efficiently implement grouping. The input XML is declared as $INPUT , but you can replace it with a call to doc() or any other accessory.

I also studied this topic on a blog last year: http://blakeley.com/blogofile/archives/560/

 element results { let $m := map:map() let $start := xs:date('2012-02-15') let $stop := xs:date('2012-02-24') let $group := for $entry in $INPUT/entry let $key := $entry/author/name/string() let $date := xs:date(xdmp:parse-yymmdd("dd/MM/yyyy", $entry/published)) where $date ge $start and $date le $stop return map:put($m, $key, 1 + (map:get($m, $key), 0)[1]) for $key in map:keys($m) let $count := map:get($m, $key) order by $count return element result { element author { element name { $key }}, element numberOfTitles { $count } } } 
+4
source

On most processors, the following should work. At MarkLogic, you can make more efficient queries, but this will get you started.

 let $doc := <entries> <entry> <id>1</id> <published>23/02/2012</published> <title>Title 1</title> <content type="html">This is title one</content> <author> <name>Pankaj</name> </author> </entry> <entry> <id>2</id> <published>22/02/2012</published> <title>Title 2</title> <content type="html">This is title two</content> <author> <name>Pankaj</name> </author> </entry> <entry> <id>3</id> <published>21/02/2012</published> <title>Title 3</title> <content type="html">This is title three</content> <author> <name>Rob</name> </author> </entry> <entry> <id>4</id> <published>20/02/2012</published> <title>Title 4</title> <content type="html">This is title four</content> <author> <name>Bob</name> </author> </entry> <entry> <id>5</id> <published>19/02/2012</published> <title>Title 1</title> <content type="html">This is title five</content> <author> <name>Pankaj</name> </author> </entry> </entries> return <results> { for $author in distinct-values($doc/entry/author/name/string()) return <result><author> <name>{$author}</name> <numberOfTitles>{count($doc/entry[author/name/string() eq $author])} </numberOfTitles> </author></result> } </results> 
+2
source

Here's another solution similar to Leo Wörteler:

 declare function local:FormatDate($origDate as xs:string) as xs:date { xs:date(string-join(reverse(tokenize($origDate, '/')), '-')) }; <results> { for $author in distinct-values(/entries/entry/author/name) let $startDate := xs:date('2012-02-15') let $endDate := xs:date('2012-02-24') order by count(/entries/entry[author/name=$author][$startDate <= local:FormatDate(published) and local:FormatDate(published) <= $endDate]) descending return <result> <author> <name>{$author}</name> </author> <numberOfTitles>{count(/entries/entry[author/name=$author][$startDate <= local:FormatDate(published) and local:FormatDate(published) <= $endDate])}</numberOfTitles> </result> } </results> 
+2
source

+1 in based on map . Other solutions have a count(/entry/author[$name=xx]) clause count(/entry/author[$name=xx]) or another XPath nested inside FLWOR , which is actually a nested loop. Nested loops cause O (N ^ 2) performance, which can be great in testing, and then slows down after increasing data size.

+2
source

Source: https://habr.com/ru/post/909121/


All Articles