One of the arguments I make to my students (Microbiology and Genetics) is that the “data” / is messy, and Python can help with this (of course, there may be other languages). So, here is a practical view of web data collection.
I notice that there are several people who answer questions related to Python among users with the highest reputation. Among the questions that naturally arise are:
I want to restore my current reputation and reputation growth rate for (the highest rated) Pythonistas on Stack Overflow to predict whether or when Alex Martelli will overtake Stephen Lott or Greg Hugill ? what about Conrad Rudolph ? Is this trivial because the increase for these guys is tied to the limit?
More generally, in the absence of an API for requests (which I think is not), is there an alternative to viewing the URLs of pages for templates, loading those pages with Python, and then clearing the html? I understand that there is probably no general approach, but I am interested in how people approach this problem.
Edit: @fitzgeraldsteele: Generally. SO is really a simple (far-fetched) example.
source
share