This is a “big” question that I don’t know how to start, so I hope some of you can give me directions. And if this is not a "good" question, I will close the stream with an apology.
I want to go through the Wikipedia database (say, in English) and make statistics. For example, I am interested in how many active editors (to be defined) on Wikipedia at any given time (say, over the past 2 years).
I don’t know how to create such a database, how to access it, find out what types of data it has and so on. So my questions are:
You want to start here: http://en.wikipedia.org/wiki/Wikipedia:Database_download
Which will bring you here: http://download.wikimedia.org/enwiki/20100312/
And the file you probably want:
# 2010-03-17 04:33:50 done Log events to all pages. * This contains the log of actions performed on pages. * pages-logging.xml.gz 1.0 GB
http://download.wikimedia.org/enwiki/20100312/enwiki-20100312-pages-logging.xml.gz
Then you import xml into MySQL. Generating histograms of users per day, week, year, etc. It does not require R. You can do this with a single MySQL query. Sort of:
select DAYOFYEAR(wiki_edit_timestamp), count(*) from page_logs group by DAYOFYEAR(wiki_edit_timestamp) order by DAYOFYEAR(wiki_edit_timestamp);
and etc.
(I'm not sure what their actual layout is, but it will be something like this.)
You will run into problems, no doubt, but you will learn a lot too. Good luck
WikiXRay (Python/R) zotero.
Source: https://habr.com/ru/post/1740603/More articles:Есть ли лучший способ поочередно отправлять несколько запросов AJAX? - javascriptBest cross-platform solution for a network server? - c ++Attaching events in JavaScript - javascriptMatching the first alphanumeric character blank (TheJQuery: tabs. How do I know when a tab was clicked? - javascriptКопировать строку - Python - pythonDelphi - WndProc () in a thread never called - multithreadingPython NameError при попытке использовать пользовательский класс - pythonBest way to communicate between .Net applications? - vb.netOnpaint events (invalid) change the order of execution after a period of normal operation (runtime) - c #All Articles