Primary question: will restrictions on Twitter limit data mining (...)
Yes, this is technically feasible , but it will take years to use only one API access token. I mean here, probably more than 6 months of continuous run.
More precisely:
- node extraction (Twitter users) can be performed very quickly, because you will use the
users/lookup API endpoint, which allows you to extract 100 nodes per request and make 180 requests in 15 minutes of the window (for each access token) - retrieving edges (follow user relationships) is the slow part, you will use
friends/ids and followers/ids API endpoints, limited to 15 requests in 15 minutes and allowing you to extract followers from more than 5,000 friends for a unique user per request.
You can use the metadata of the nodes (descriptions of texts, locations, languages, time zones) to perform some interesting analysis without even having to extract the βgraphβ (follow the relationships between everyone)
The work around is to parallelize parts of the extraction by spreading the extraction through several access tokens. Seems compatible with me regarding terms of use if you respect secure accounts.
In any case, you should filter out the selection of edges for celebrities (you probably do not want to extract followers of hootsuite, there are almost 6 million of them).
disclaimer: self-promotion here: in case you do not want to develop it yourself, I could do the extraction for you and provide you with a graphic file, since I am extracting twitter columns on tribalytics . (I read this one and which before publication).
I am also trying to find out if there is any process to request higher speed limits for any research purpose.
Officially, there are no whitelisted apps with higher speed limits, for example, with the previous twitter API version. You should probably get in touch with Twitter and see if they can help you, since your work is aimed at an academic purpose.
Most likely, I will have to scale the project, which is OK
I would advise you to reduce your initial list of 600 users as much as you can . Just keep those who are truly central to your topic and whose audience is not too large. Extracting a local celebrity schedule will give you a schedule with many people who are not at all related to the population you want to study.