How to extract data from a Wikipedia article?

I have a question regarding data analysis from Wikipedia for my Android application. I have a script that can load XML by reading the source code from http://en.wikipedia.org/w/api.php?action=parse&prop=text&format=xml&page=ARTICLE_NAME (as well as JSON, replacing format=xml with format=json .

But I can’t figure out how to access some sections from the table of contents. I want the page to load, the user can click a button from which a pop-up window appears that displays the headings from the table of contents and allows the user to read this part and only this part is for convenience. I'm a little shaky with JSON, but is it possible to do this? Or is there a Wikipedia API that allows a developer to view only certain parts of a page?

Thanks!

+6
source share
2 answers

Unfortunately, the mediawiki.org documentation for parse doesn't seem to tell you how to do this. But the documentation is in the API itself : you can use the section parameter. And you can use prop=sections to get a list of sections.

So you can first use:

http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=sections

to get a list of sections and then

http://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Android_%28operating_system%29&prop=text§ion=26

to get HTML for a specific section.

+9
source

action = parse does not work with parsing, consider this example:

 Foo is a bar<ref>really!</ref> == References == <references/> 

Parsing only the null section will result in a red error message, even if parsing the first one results in empty link lists.

However, there is a better solution: action = mobileview is not only freed from this problem, but it is also specially designed for mobile applications and gives you HTML-optimized for mobile devices.

+2
source

Source: https://habr.com/ru/post/914998/


All Articles