BioPython: skipping bad GIDs with Entrez.esummary / Entrez.read

Sorry for the odd name.

I use eSearch and eSummary to go from

Access Number β†’ gID β†’ TaxID

Suppose a β€œjoin” is a list of 20 access numbers (I do 20 at a time, because the maximum that the NCBI allows).

I do:

handle = Entrez.esearch(db="nucleotide", rettype="xml", term=accessions)
record = Entrez.read(handle)
gids = ",".join(record[u'IdList'])

This gives me 20 matching GIDs from these 20 registration numbers.

The following are:

handle = Entrez.esummary(db="nucleotide", id=gids)
record = Entrez.read(handle)

Which gives me this error, because one of the GIDs in the gids has been removed from the NCBI:

File ".../biopython-1.52/build/lib.macosx-10.6-universal-2.6/Bio/Entrez/Parser.py", line 191, in endElement value = IntegerElement(value)
ValueError: invalid literal for int() with base 10: ''

I could try: except: except that it will skip the other 19 GIDs which are ok.

My question is:

20 Entrez.read , , 20? , ( 300 000 , NCBI 3 , 1 ).

+3
2

BioPython. , , .

+3

Parser.py , . , NCBI , .

, /monkeypatch , .

0

Source: https://habr.com/ru/post/1719355/


All Articles