Computational Trends

Let's say I collect twitter tweets based on a lot of criteria and save these tweets in a local mysql database. I want to be able to discuss topics related to a computer, such as twitter, which can contain from 1 to 3 words.

Is it possible to write a script to do something like this PHP and mysql?

I found an answer on how to calculate which terms are “hot” as soon as you can get the number of terms, but I'm stuck in the first part. How to store data in a database, how can I calculate the frequency of terms in a database 1-3 words long?

+4
source share
4 answers

Trending topic of receiving from me:
1. select tweets
2. Divide each tweet with a space by n-gram (up to 3 grams if you want 3 word lengths) array
3. Filter each array from url, @username, common words and junk
4. counting all unique keywords / phrase
5. Disable unnecessary word / phrase

yes you can do it in php and mysql;)

+2
source

How to decompose your tweets first in solid tokens and calculate for each word its number of occurrences? After you receive them, you can decompose the words in all two tokens, calculate the number of occurrences and, finally, do the same with all three tokens of the word.

You can also add some dictionary of words that you do not want to count

+1
source

You will need

  • classification of documents or ..
  • automatic marking

Probably the second. And only then you can calculate your popularity on time.

+1
source

Or do the opposite of Dominik and save the list of phrases that you want to match, spaces and all. Write them as regular expression strings. For each row in the database (file, sql table, whatever), process the regular expression, find the score.

It depends on how you want to do this trivially: everything is what is common, thereby discovering what is really trending, or search for a phrase. In one case, you will find a lot that may not interest you, and you will need an extensive list of locks - in the other case, you will need a huge whitelist.

To go beyond this, you will need natural language processing tools to determine the meaning of what is said.

0
source

Source: https://habr.com/ru/post/1301043/


All Articles