Check if MediaWiki page exists (Python)

I am working on a Python script that converts this:

foo bar 

In it:

 [[Component foo]] [[bar]] 

The script checks (on the input line) if the Component foo page exists. If it exists, a link to this page is created; if it does not exist, a direct link is created.

The problem is that I need a quick and cheap way to check if there are many wiki pages. I do not want (try) to load all the Component pages.

I already figured out a quick way to do this manually: edit the new wiki page. paste all the links of the “component” to the page, click “Preview”, and then save the resulting HTML preview page. As a result, the HTML file contains a different link for existing pages than for non-existent pages.

So, to rephrase my question: how to save the media preview page in Python?

(I do not have local access to the database.)

+4
source share
5 answers

You can definitely use the API to check for the presence of a page:

 #Assuming words is a list of words you wish to query for import urllib # replace en.wikipedia.org with the address of the wiki you want to access query = "http://en.wikipedia.org/w/api?action=query&titles=%s&format=xml" % "|".join(words) pages = urllib.urlopen(query) 

Now the pages on which you will contain xml look like this:

 <?xml version="1.0"?><api><query><pages> <page ns="0" title="DOESNOTEXIST" missing="" /> <page pageid="600799" ns="0" title="FOO" /> <page pageid="11178" ns="0" title="Foobar" /> </pages></query></api> 

Pages that do not exist are not displayed here, but they have the missing = "" attribute, as seen above. You can also check for an invalid attribute on the save side.

Now you can use your favorite xml parser to check these attributes and react accordingly.

See also: http://www.mediawiki.org/wiki/API:Query

+9
source

Use Pywikibot to interact with MediaWiki software. This is probably the most powerful bot platform.

The Python Wikipediabot Framework ( pywikipedia or PyWikipediaBot ) is a set of tools that automate work on MediaWiki sites. Originally developed for Wikipedia, it is now used throughout the Wikimedia Foundation Projects and many other Wikis of the MediaWiki. It is written in Python, which is a free, cross-platform programming language. This page contains links to general information for people who want to use the bot software.

+5
source

If you have local access to the wiki database, the easiest way is to query the database to see if each page exists.

If you have only HTTP access, you can try the mechanize library, which allows you to programmatically automate tasks that would otherwise require a browser.

+2
source

You should be able to use the MediaWiki API. http://www.mediawiki.org/wiki/API (possibly in the "Requests" or "Creating / Editing" section)

I am not very familiar with this, but, for example, you can compare the output of an existing page with a nonexistent page.

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Bill_Gates&rvprop=timestamp

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=NONEXISTENT_PAGE&rvprop=timestamp

+1
source

Since the pages are stored in a database, you will have to access them anyway. Since you do not have local access, the API, as suggested, is probably this - but there may be alternatives.

http://www.mwusers.com/forums/forum.php

This seems to be the place for such questions. I saw questions requiring a deep knowledge of the domestic media that answered quickly and comprehensively in this forum.

0
source

Source: https://habr.com/ru/post/1304031/


All Articles