Trying to get taxonomic information from Biopython

I am trying to modify a previous script that biopython uses to get view type information. This script was written to get information one look at a time. I would like to modify the script so that I can do this for 100 organisms at a time. Here is the initial code

import sys from Bio import Entrez def get_tax_id(species): """to get data from ncbi taxomomy, we need to have the taxid. we can get that by passing the species name to esearch, which will return the tax id""" species = species.replace(" ", "+").strip() search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml") record = Entrez.read(search) return record['IdList'][0] def get_tax_data(taxid): """once we have the taxid, we can fetch the record""" search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml") return Entrez.read(search) Entrez.email = "" if not Entrez.email: print "you must add your email address" sys.exit(2) taxid = get_tax_id("Erodium carvifolium") data = get_tax_data(taxid) lineage = {d['Rank']:d['ScientificName'] for d in data[0]['LineageEx'] if d['Rank'] in ['family', 'order']} 

I managed to modify the script so that it accepts a local file containing one of those organisms that I use. But I need to extend this to 100 organisms. Therefore, the idea was to generate a list from a file of my organisms and somehow separately submit each element obtained from the list to the line taxid = get_tax_id("Erodium carvifolium") and replace "Erodium carvifolium" with my name. But I do not know how to do this.

Here is a sample version of the code with some of my settings

  import sys from Bio import Entrez def get_tax_id(species): """to get data from ncbi taxomomy, we need to have the taxid. we can get that by passing the species name to esearch, which will return the tax id""" species = species.replace(' ', "+").strip() search = Entrez.esearch(term = species, db = "taxonomy", retmode = "xml") record = Entrez.read(search) return record['IdList'][0] def get_tax_data(taxid): """once we have the taxid, we can fetch the record""" search = Entrez.efetch(id = taxid, db = "taxonomy", retmode = "xml") return Entrez.read(search) Entrez.email = "" if not Entrez.email: print "you must add your email address" sys.exit(2) list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304'] i = iter(list) item = i.next() for item in list: ??? taxid = get_tax_id(?) data = get_tax_data(taxid) lineage = {d['Rank']:d['ScientificName'] for d in data[0]['LineageEx'] if d['Rank'] in ['phylum']} print lineage, taxid 

Question marks relate to places where I am a dead end, how to do next. I don’t see how I can connect my loop for replacement? in get_tax_id (?). Or I need to somehow add each of the elements in the list so that they change each time to contain get_tax_id(Helicobacter pylori 26695) , and then find a way to place them in a line containing taxid =

+4
source share
1 answer

Here is what you need, put this below your function definitions, i.e. after the line that says: sys.exit(2)

 species_list = ['Helicobacter pylori 26695', 'Thermotoga maritima MSB8', 'Deinococcus radiodurans R1', 'Treponema pallidum subsp. pallidum str. Nichols', 'Aquifex aeolicus VF5', 'Archaeoglobus fulgidus DSM 4304'] taxid_list = [] # Initiate the lists to store the data to be parsed in data_list = [] lineage_list = [] print('parsing taxonomic data...') # message declaring the parser has begun for species in species_list: print ('\t'+species) # progress messages taxid = get_tax_id(species) # Apply your functions data = get_tax_data(taxid) lineage = {d['Rank']:d['ScientificName'] for d in data[0]['LineageEx'] if d['Rank'] in ['phylum']} taxid_list.append(taxid) # Append the data to lists already initiated data_list.append(data) lineage_list.append(lineage) print('complete!') 
+2
source

Source: https://habr.com/ru/post/1480311/


All Articles