I want to index wikipedia for elastics search.
I tried stream2es + elasticsearch 2.0.0 and Wikipedia River Plugin 2.6.0 + elasticsearch 1.6.0 to index the latest wikipedia https://dumps.wikimedia.org/enwiki/20151102/enwiki-20151102-pages-articles-multistream.xml. bz2 .
However, both received the same error message:
XML document structures must start and end within the same entity.
, XML , . wikimedia elasticsearch.
, .
API elasticsearch. JSON, elasticsearch.
, :
curl https://en.wikipedia.org/w/api.php?action=cirrus-mapping-dump&format=json > mapping.json
jq .content < mapping.json | curl -XPUT localhost:9200/enwiki_content --data @-
zcat enwiki-20151116-cirrussearch-general.json.gz | parallel --pipe -L 2 -N 2000 -j3 'curl -s http://localhost:9200/enwiki_content/_bulk --data-binary @- > /dev/null'
Source: https://habr.com/ru/post/1615246/More articles:Need advice on pushState and onpopstate - javascriptCDEE injection in TomEE Arkillian testing with web sockets - websocketHow to detect a touch on the touchpad on an Apple TV remote? - tvosКак передать пустой список с параметром типа? - javaForce touch with tvOS - tvosAngularJS: ng-click and select text - javascriptASP.NET MVC6 architecture not working in BETA 8 - asp.net-corehttps://translate.googleusercontent.com/translate_c?depth=1&pto=aue&rurl=translate.google.com&sl=ru&sp=nmt4&tl=en&u=https://fooobar.com/questions/1615249/which-are-the-control-points-to-createaproximate-a-circle-using-8-cubic-bezier-curves&usg=ALkJrhgYGdqe2kxJYOuvU_0E_Nq9pL8QiwWhen do you want more / less HTTP requests? - javascripthttps://translate.googleusercontent.com/translate_c?depth=1&pto=aue&rurl=translate.google.com&sl=ru&sp=nmt4&tl=en&u=https://fooobar.com/questions/1615251/create-a-doctrine-repository-with-dependencies-dependency-injection-in-zf2&usg=ALkJrhidNPb1JEiH7wVbISYObSgrpp6f5gAll Articles