I am trying to modify a previous script that biopython uses to get view type information. This script was written to get information one look at a time. I would like to modify the script so that I can do this for 100 organisms at a time. Here is the initial code
import sys from Bio import Entrez def get_tax_id(species): """to get data from ncbi taxomomy, we need to have the taxid. we can get that by passing the species name to esearch, which will return the tax id""" species = species.replace(" ", "+").strip() search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml") record = Entrez.read(search) return record['IdList'][0] def get_tax_data(taxid): """once we have the taxid, we can fetch the record""" search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml") return Entrez.read(search) Entrez.email = "" if not Entrez.email: print "you must add your email address" sys.exit(2) taxid = get_tax_id("Erodium carvifolium") data = get_tax_data(taxid) lineage = {d['Rank']:d['ScientificName'] for d in data[0]['LineageEx'] if d['Rank'] in ['family', 'order']}
I managed to modify the script so that it accepts a local file containing one of those organisms that I use. But I need to extend this to 100 organisms. Therefore, the idea was to generate a list from a file of my organisms and somehow separately submit each element obtained from the list to the line taxid = get_tax_id("Erodium carvifolium") and replace "Erodium carvifolium" with my name. But I do not know how to do this.
Here is a sample version of the code with some of my settings
import sys from Bio import Entrez def get_tax_id(species): """to get data from ncbi taxomomy, we need to have the taxid. we can get that by passing the species name to esearch, which will return the tax id""" species = species.replace(' ', "+").strip() search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml") record = Entrez.read(search) return record['IdList'][0] def get_tax_data(taxid): """once we have the taxid, we can fetch the record""" search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml") return Entrez.read(search) Entrez.email = "" if not Entrez.email: print "you must add your email address" sys.exit(2) list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304'] i = iter(list) item = i.next() for item in list: ??? taxid = get_tax_id(?) data = get_tax_data(taxid) lineage = {d['Rank']:d['ScientificName'] for d in data[0]['LineageEx'] if d['Rank'] in ['phylum']} print lineage, taxid
Question marks relate to places where I am a dead end, how to do next. I donβt see how I can connect my loop for replacement? in get_tax_id (?). Or I need to somehow add each of the elements in the list so that they change each time to contain get_tax_id(Helicobacter pylori 26695) , and then find a way to place them in a line containing taxid =