BioPython: skipping bad GIDs with Entrez.esummary / Entrez.read

Question

BioPython: skipping bad GIDs with Entrez.esummary / Entrez.read

Sorry for the odd name.

I use eSearch and eSummary to go from

Access Number → gID → TaxID

Suppose a “join” is a list of 20 access numbers (I do 20 at a time, because the maximum that the NCBI allows).

I do:

handle = Entrez.esearch(db="nucleotide", rettype="xml", term=accessions)
record = Entrez.read(handle)
gids = ",".join(record[u'IdList'])

This gives me 20 matching GIDs from these 20 registration numbers.

The following are:

handle = Entrez.esummary(db="nucleotide", id=gids)
record = Entrez.read(handle)

Which gives me this error, because one of the GIDs in the gids has been removed from the NCBI:

File ".../biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py", line 191, in endElement value = IntegerElement(value)
ValueError: invalid literal for int() with base 10: ''

I could try: except: except that it will skip the other 19 GIDs which are ok.

My question is:

20 Entrez.read , , 20? , ( 300 000 , NCBI 3 , 1 ).

+3

python bioinformatics biopython

Austin Richardson 06 . '09 4:15

2

Austin Richardson · Answer 1 · 2009-10-07T13:52:55+0000

BioPython. , , .

John La Rooy · Answer 2 · 2009-10-06T04:24:01+0000

Parser.py , . , NCBI , .

, /monkeypatch , .

BioPython: skipping bad GIDs with Entrez.esummary / Entrez.read

More articles: