I am trying to extract cross-language links from Wikipedia dumps. It seems that these links have been moved to the WikiData Project, and access is only provided through the API.
This section explains how to solve this problem and suggests switching to the API: Getting cross-language links from an exported Wikipedia article?
However, the scope of my research is apparently too large to use the web API (millions of requests). Does anyone know if these links can be retrieved from anywhere except the API? Parsing a dump of any size is preferable to an API request.
Wikipedia dumps I used: http://dumps.wikimedia.org/backup-index.html
WikiData dump that I used: http://dumps.wikimedia.org/wikidatawiki/latest/
A great library for working with Wikidata dumps is the Wikidata Toolkit , where you can find detailed information for you. The latest release is 0.3growing a collection of sample scripts that help with basic tasks like yours. In readme examples we find SitelinksExample.java:
0.3
SitelinksExample.java
, , . "enwiki" "hewikivoyage" . URL- Wikidata Toolkit , . , .
Source: https://habr.com/ru/post/1548280/More articles:Client side ConnectionAbortedError: [WinError 10053] while waiting for s.recv (4096) response - pythonHow to make JTable cells inaccessible for editing, but selectable - javarow index is always -1 displayed in JTable - javaJQuery CORS query dies in preflight OPTIONS query in FireFox - jqueryWhat is the best way to store fixed point decimal values ββin Python for cryptocurrency? - pythonTransferring a binary tree on a network with minimal space - algorithmHow to clean processes running Runtime.exec ()? - javaEasy way to export Wikipedia translated names - wikipediamysql multiple COUNT () from multiple tables with LEFT JOIN - phpUsage Search :: in inested expression - rAll Articles