Will the restriction on Twitter be limited to the fact that I can do the data mining necessary to build a complete social network schedule of about 600 thousand users?

The primary question is: will the restrictions on Twitter limit the data for data mining needed to build a complete social network schedule with all directed edges from about 600 thousand users?

Here is an idea:

The edges / links / relationships in the network will be associated with the sequence / sequence.

Start with a specific list of approximately 600 Twitter users selected because they are all from all the news outlets in the big city.

Gather all the followers and friends (the people they follow) for all 600 users. These users probably have an average of 2,000 followers. They probably have an average number of friends (the people they follow) equal to 500.

Since these 600 followers are in the same city, it is expected that many of these followers will be the same users following these 600 people. So let's get close and assume that these 600 users have only 600,000 followers and friends. Thus, it will be a subgraph / network of 600,600 Twitter users.

So, as soon as I gathered all 600,000 followers and friends of all these 600 people, I want to be able to build a social network of all these 600,600 people and their followers. This would require that I can at least find all directional edges among the 600,600 users (regardless of whether each of the 600,600 users follows each other). With a Twitter bid limit, will this kind of data mining be feasible?

+4
source share
2 answers

I will answer these questions in reverse order, starting with David Marx: Well, I have access to a fairly reliable computer research center with a ton of storage capacity, so this should not be a problem. However, I do not know if the software can handle this.

Most likely, I will have to reduce the scale of the project, and this is normal. The idea for me is to start with a wider idea, find out how big it is, and then fend off accordingly.

In response to a question from Anony-Mousse: Part of my problem is that I'm not sure that I correctly interpret Twitter bid limits. I'm not sure if these are 15 requests in 15 minutes, or 30 requests in 15 minutes. And I think that 1 request will receive 5,000 followers / friends, so you can probably gather 75,000 friends or followers every 15 minutes if the limit is 15 requests in 15 minutes. I am also trying to find out if there is any process to request higher speed limits for any research purpose.

This is where they list the limits: https://dev.twitter.com/docs/rate-limiting/1.1/limits

+1
source

Primary question: will restrictions on Twitter limit data mining (...)

Yes, this is technically feasible , but it will take years to use only one API access token. I mean here, probably more than 6 months of continuous run.

More precisely:

  • node extraction (Twitter users) can be performed very quickly, because you will use the users/lookup API endpoint, which allows you to extract 100 nodes per request and make 180 requests in 15 minutes of the window (for each access token)
  • retrieving edges (follow user relationships) is the slow part, you will use friends/ids and followers/ids API endpoints, limited to 15 requests in 15 minutes and allowing you to extract followers from more than 5,000 friends for a unique user per request.

You can use the metadata of the nodes (descriptions of texts, locations, languages, time zones) to perform some interesting analysis without even having to extract the β€œgraph” (follow the relationships between everyone)

The work around is to parallelize parts of the extraction by spreading the extraction through several access tokens. Seems compatible with me regarding terms of use if you respect secure accounts.

In any case, you should filter out the selection of edges for celebrities (you probably do not want to extract followers of hootsuite, there are almost 6 million of them).

disclaimer: self-promotion here: in case you do not want to develop it yourself, I could do the extraction for you and provide you with a graphic file, since I am extracting twitter columns on tribalytics . (I read this one and which before publication).

I am also trying to find out if there is any process to request higher speed limits for any research purpose.

Officially, there are no whitelisted apps with higher speed limits, for example, with the previous twitter API version. You should probably get in touch with Twitter and see if they can help you, since your work is aimed at an academic purpose.

Most likely, I will have to scale the project, which is OK

I would advise you to reduce your initial list of 600 users as much as you can . Just keep those who are truly central to your topic and whose audience is not too large. Extracting a local celebrity schedule will give you a schedule with many people who are not at all related to the population you want to study.

0
source

Source: https://habr.com/ru/post/1485047/


All Articles