Getting all Wikipedia articles with coordinates inside London

Question

Getting all Wikipedia articles with coordinates inside London

In general, I want to get links (and headings) of all Wikipedia articles with coordinates inside London. I tried using Google, but unfortunately did not come up with the correct search terms. Any clues?

+5

python wikipedia wikipedia-api openstreetmap

marcus Feb 12 '16 at 11:52

source share

2 answers

kwinkunks · Answer 1 · 2016-02-12T13:41:45+0000

It really is just a bunch of ideas, too big for comments.

Best of all, most likely DBpedia . This is a semantic Wikipedia mirror with much more complex query capabilities than the Wikipedia API. As you can see in this article , it can handle fairly complex spatial queries, but you need to get into SPARQL . Here is the figure from this article:

However, the Wikipedia API has a relatively new feature for spatial queries: Display neighboring wiki information . I don’t think you can search in the training ground, but this is a good start.

Here is the previous answer I wrote about using mwclient to get the coordinates from the articles, but this user had the advantage of having a list of articles to clean up.

Geonames.org can help narrow your search to geographic articles. It would not be so bad to check 806,000 geolocation articles on English Wikipedia.

For performance reasons and to avoid problems for Wikipedia servers, you might consider working with a Wikipedia or DBpedia dump.

scai · Answer 2 · 2016-02-12T17:14:11+0000

Similar to the OpenStreetMap task and transition APIs .

To build our request, go to overpass turbo (a good interface for the Overpass API), open the wizard and enter "wikipedia = * to London", because we are interested in the wikipedia tag .

An automatically generated and executed request will be like this.

 [out:json][timeout:25]; // fetch area "London" to search in {{geocodeArea:London}}->.searchArea; // gather results ( // query part for: "wikipedia=*" node["wikipedia"](area.searchArea); way["wikipedia"](area.searchArea); relation["wikipedia"](area.searchArea); ); // print results out body; >; out skel qt;

This will result in the return of too many elements, which also heavily burden your browser. And it may fail due to a too low timeout.

We will change it a little. We increase the timeout and delete the recursion step ( >; ), because we are only interested in direct results, and not any related objects. The resulting query will be as follows:

 [out:json][timeout:90]; // fetch area "London" to search in {{geocodeArea:London}}->.searchArea; // gather results ( // query part for: "wikipedia=*" node["wikipedia"](area.searchArea); way["wikipedia"](area.searchArea); relation["wikipedia"](area.searchArea); ); // print results out body; out skel qt;

Here you can see the result.

Now for its export there are various options. You can export by turbocharging and either save the results directly to a file or receive an unprocessed request that is sent to the Overpass API. Now you can run this request directly from your python script.

Note that there are available output formats : JSON, XML, and CVS. And next to the wikipedia tag, you may also be interested in the wikidata tag .

Also note that this will not lead to all Wikipedia pages with coordinates within London, only those contained in the OSM database.

Getting all Wikipedia articles with coordinates inside London

More articles: