Mapping from page_title to old_text file

I am trying to access a WikiMedia database using an SQL query to retrieve the contents of an article page. ('old_text', from the 'text' table) by looking for its page_title (from the 'page' table) . Unfortunately, I do not see a comparison between page_id (from the page table) an the old_id (from the 'text' table) .
So, how can I get text from an article with an SQL query using the specified page title?

+4
source share
2 answers

To add svick's answer, here is an example sample SQL query:

 SELECT old_text, old_flags FROM page JOIN revision ON rev_id = page_latest JOIN text ON old_id = rev_text_id WHERE page_title = 'Main_Page' AND page_namespace = 0; 

(Note that you will need the old_flags field to understand the contents of old_text . At least you need to make sure the flags field contains the value that you expect from it.)

+1
source

The documentation for the text table says:

old_id

revision.rev_text_id in the revision table is the key to this column.

So, to get the text of a certain version, you need to know the identifier of the version you need. If you want only the latest version of a page, its identifier is stored in page_latest .

+1
source

Source: https://habr.com/ru/post/1433805/


All Articles