I seem to be fixated on the logic of the Solr cut tag cloud. First of all, I use OpenNLP to analyze my documents and get the corresponding words from it, so every single document is divided into n number of words. And here is basically what my Solr answer looks like:
<docID> <title>My Doc Title</title> <content>My Doc Title</content> <date_published>My Doc Title</date_published> </docID>
I believe that there should be a way to integrate words. At first I thought of something like this:
<docID> <title>My Doc Title</title> <content>My Doc Title</content> <date_published>My Doc Title</date_published> <words>word</words> <words1>word1</words1> <words2>word2</words2> <words3>word3</words3> <wordsN>wordN</wordsN> </docID>
But fighting would not be possible, since I donβt know how many word fields I would get on the docID, then the cut would have to be done through the fields (which Iβm not even sure maybe), Iβm trying to study the possible answers, but I seem to stuck ... in the end, I need to make a line of n words that will receive every single document that I have in my index. Thoughts will be highly appreciated.
source share