Solr Tag Tag Cloud

I seem to be fixated on the logic of the Solr cut tag cloud. First of all, I use OpenNLP to analyze my documents and get the corresponding words from it, so every single document is divided into n number of words. And here is basically what my Solr answer looks like:

<docID> <title>My Doc Title</title> <content>My Doc Title</content> <date_published>My Doc Title</date_published> </docID> 

I believe that there should be a way to integrate words. At first I thought of something like this:

 <docID> <title>My Doc Title</title> <content>My Doc Title</content> <date_published>My Doc Title</date_published> <words>word</words> <words1>word1</words1> <words2>word2</words2> <words3>word3</words3> <wordsN>wordN</wordsN> </docID> 

But fighting would not be possible, since I don’t know how many word fields I would get on the docID, then the cut would have to be done through the fields (which I’m not even sure maybe), I’m trying to study the possible answers, but I seem to stuck ... in the end, I need to make a line of n words that will receive every single document that I have in my index. Thoughts will be highly appreciated.

+3
source share
1 answer

I would suggest using a single word, multi-valued, and save the list of words in the document.

having an unrelated number of words \ d + fields will complicate the situation.

if you use a single-valued multi-valued field, you can get all the words along with their frequencies, which should be enough to create a tag cloud.

+2
source

Source: https://habr.com/ru/post/896701/


All Articles