I want to copy text. I understand the concept of clustering text content from Maut in action:
- display (int β term) all members in the input file and save in the dictionary
- convert all input documents to normalized sparse vector
- do clustering
I want to copy the text, as well as other information such as date-time, location, people I was with. For example, I want documents made in a 10-day visit to be in a remote location for placement in a separate cluster.
I know that I have to write my own tool for creating vectors from date, place, tags and (natural) text. How do I approach this? Should I use the built-in tools to vectorize the text, and then integrate this output into my own vectors? How about weighting sizes?
source share