How to load all abstract data from ncbi published data

I want to download all abstract abstracts. Does anyone know how I can easily download all published articles?

I got the data source: ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/af/12/

Is it possible to download all these tar files ..

Thanks in advance.

+4
source share
2 answers

There is a package called rentrez https://ropensci.org/packages/ . Check this. You can get abstracts for certain keywords or PMID, etc. I hope this helps.

UPDATE: you can download all abstracts by passing an IDS list using the following code.

    library(rentrez)
    library(xml)

your.ids <- c("26386083","26273372","26066373","25837167","25466451","25013473")
# rentrez function to get the data from pubmed db
fetch.pubmed <- entrez_fetch(db = "pubmed", id = your.ids,
                      rettype = "xml", parsed = T)
# Extract the Abstracts for the respective IDS.  
abstracts = xpathApply(fetch.pubmed, '//PubmedArticle//Article', function(x)
                               xmlValue(xmlChildren(x)$Abstract))
# Change the abstract names with the IDS.
names(abstracts) <- your.ids
abstracts
col.abstracts <- do.call(rbind.data.frame,abstracts)
dim(col.abstracts)
write.csv(col.abstracts, file = "test.csv")
+2
source

, .

python, script :

import requests
import json

search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&mindate=1800/01/01&maxdate=2016/12/31&usehistory=y&retmode=json"
search_r = requests.post(search_url)
search_data = search_r.json()
webenv = search_data["esearchresult"]['webenv']
total_records = int(search_data["esearchresult"]['count'])
fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmax=9999&query_key=1&webenv="+webenv

for i in range(0, total_records, 10000):
    this_fetch = fetch_url+"&retstart="+str(i)
    print("Getting this URL: "+this_fetch)
    fetch_r = requests.post(this_fetch)
    f = open('pubmed_batch_'+str(i)+'_to_'+str(i+9999)+".json", 'w')
    f.write(fetch_r.text)
    f.close()

print("Number of records found :"+str(total_records))

, entrez/eutils , . "webenv" ( ) total_records. webenv efetch.

(efetch) 10000, 9999 , .

, ( 200 , ), request.post() try/except. , / , , HTTP 200.

0

Source: https://habr.com/ru/post/1614427/


All Articles