Getting cross-language links from a Wiki dump

I am trying to extract cross-language links from Wikipedia dumps. It seems that these links have been moved to the WikiData Project, and access is only provided through the API.

This section explains how to solve this problem and suggests switching to the API: Getting cross-language links from an exported Wikipedia article?

However, the scope of my research is apparently too large to use the web API (millions of requests). Does anyone know if these links can be retrieved from anywhere except the API? Parsing a dump of any size is preferable to an API request.

Wikipedia dumps I used: http://dumps.wikimedia.org/backup-index.html

WikiData dump that I used: http://dumps.wikimedia.org/wikidatawiki/latest/

+4
source share
1 answer

A great library for working with Wikidata dumps is the Wikidata Toolkit , where you can find detailed information for you. The latest release is 0.3growing a collection of sample scripts that help with basic tasks like yours. In readme examples we find SitelinksExample.java:

, , . "enwiki" "hewikivoyage" . URL- Wikidata Toolkit , . , .

+2

Source: https://habr.com/ru/post/1548280/


All Articles