Build a dictionary cloud by date for twitter search result? (using R)

I want to find twitter after a word (say #google), and then I can create a tag cloud from the words used in twitts, but according to dates (for example, having a moving hour window that moves 10 minutes every time and shows me how different words are more often used throughout the day).

I would appreciate any help on how to do this: resources for information, programming code (R is the only language I use) and visualization ideas. Questions:

  • How do I get information?

    In R, I found that the twitteR package has a searchTwitter command. But I don’t know how big I can get from him. In addition, it does not return the dates on which the tweet occurred.

    I see here that I can get up to 1,500 tweets, but that requires me to make it out manually (which leads me to step 2). Also, for my purposes, I would need tens of thousands of tweets. Is it even possible to get them in retrospect? (for example, ask old messages every time through the API URL?) If not, there is a more general question about how to create a personal tweet repository on your home computer? (a question that might be better left to another SO thread - although any ideas from people here would be very interesting to me)

  • How to analyze information (in R)? I know that R has functions that can help in the rcurl and twitteR packages. But I do not know what and how to use them. Any suggestions would help.

  • How to analyze? How to remove all "not interesting" words? I found that the β€œtm” package in R has this example :

    reuters <- tm_map (reuters, removeWords, stop words ("English"))

    Will it be a trick? Should I do something else / more?

    In addition, I assume that I would like to do this after reducing my dataset according to time (which will require some posix-like functions (which I'm not quite sure what will be needed here or how to use it).

  • And finally, the question arises of visualization. How to create a word tag cloud? I found a solution for this here , any other suggestion / recommendation?

I believe that I am asking a huge question here, but I tried to break it down into the simplest questions possible. Any help would be appreciated!

Best

Tal

+3
source share
4 answers
+6
source

As for the figure: I made the word cloud here: http://trends.techcrunch.com/2009/09/25/describe-yourself-in-3-or-4-words/ , using the snippet package, my code is there. I manually pulled out certain words. Check this out and let me know if you have more specific questions.

+2
source

I note that this is an old question, and there are several solutions available through web search, but here is one answer (via http://blog.ouseful.info/2012/02/15/generating-twitter-wordclouds-in-r -prompted-by-an-open-learning-blogpost / ):

require(twitteR) searchTerm='#dev8d' #Grab the tweets rdmTweets <- searchTwitter(searchTerm, n=500) #Use a handy helper function to put the tweets into a dataframe tw.df=twListToDF(rdmTweets) ##Note: there are some handy, basic Twitter related functions here: ##https://github.com/matteoredaelli/twitter-r-utils #For example: RemoveAtPeople <- function(tweet) { gsub("@\\w+", "", tweet) } #Then for example, remove @d names tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople)) ##Wordcloud - scripts available from various sources; I used: #http://rdatamining.wordpress.com/2011/11/09/using-text-mining-to-find-out-what-rdatamining-tweets-are-about/ #Call with eg: tw.c=generateCorpus(tw.df$text) generateCorpus= function(df,my.stopwords=c()){ #Install the textmining library require(tm) #The following is cribbed and seems to do what it says on the can tw.corpus= Corpus(VectorSource(df)) # remove punctuation tw.corpus = tm_map(tw.corpus, removePunctuation) #normalise case tw.corpus = tm_map(tw.corpus, tolower) # remove stopwords tw.corpus = tm_map(tw.corpus, removeWords, stopwords('english')) tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords) tw.corpus } wordcloud.generate=function(corpus,min.freq=3){ require(wordcloud) doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1)) dm = as.matrix(doc.m) # calculate the frequency of words v = sort(rowSums(dm), decreasing=TRUE) d = data.frame(word=names(v), freq=v) #Generate the wordcloud wc=wordcloud(d$word, d$freq, min.freq=min.freq) wc } print(wordcloud.generate(generateCorpus(tweets,'dev8d'),7)) ##Generate an image file of the wordcloud png('test.png', width=600,height=600) wordcloud.generate(generateCorpus(tweets,'dev8d'),7) dev.off() #We could make it even easier if we hide away the tweet grabbing code. eg: tweets.grabber=function(searchTerm,num=500){ require(twitteR) rdmTweets = searchTwitter(searchTerm, n=num) tw.df=twListToDF(rdmTweets) as.vector(sapply(tw.df$text, RemoveAtPeople)) } #Then we could do something like: tweets=tweets.grabber('ukgc12') wordcloud.generate(generateCorpus(tweets),3) 
+2
source

I would like to answer your question in creating a large word cloud. I did it

  • Use s0.tweet <- searchTwitter (KEYWORD, n = 1500) for 7 days or more, such as THIS .

  • Combine them by this command:

rdmTweets = c (s0.tweet, s1.tweet, s2.tweet, s3.tweet, s4.tweet, s5.tweet, s6.tweet, s7.tweet)

Result:

Lynas square cloud

This square cloud consists of 9,000 tweets.

Source: People talk about Lynas Malaysia via Twitter. Analysis with R CloudStat

Hope this helps!

0
source

Source: https://habr.com/ru/post/890067/


All Articles