How to calculate the average protein structure with several models / conformations

I have a PDB file '1abz' ( https://files.rcsb.org/view/1ABZ.pdb ) containing the coordinates of the protein structure with 23 different models (numbered MODEL 1-23). Please ignore the heading notes, interesting information starts at line 276, which says β€œMODEL 1”.

I would like to calculate the average protein structure. The PDB protein file contains several conformations / models, and I would like to calculate the average coordinates for individual atoms for each residue, so I get one conformation / model.

I could not figure out how to do this using Biopython, so I tried to calculate the average coordinates using Pandas. I think I was able to calculate the average, but the problem is that I have a csv file that is no longer in PDB format, so I can’t upload this file to PyMol.

My questions: how to convert the csv file to the PDB format. Even better, how can I get the average coordinates in Biopython or Python without compromising the original pdb file format?

Here is the code I used to calculate the average coordinates in Pandas.

#I first converted my pdb file to a csv file import pandas as pd import re pdbfile = '1abz.pdb' df = pd.DataFrame(columns=['Model','Residue','Seq','Atom','x','y','z']) #make dataframe object i = 0 #counter ma = re.compile("MODEL\s+(\d+)") regex1 = "([AZ]+)\s+(\d+)\s+([^\s]+)\s+([AZ]+)[+-]?\s+([AZ]|)" regex2 = "\s+(\d+)\s+([+-]?\d+\.\d+\s+[+-]?\d+\.\d+\s+[+-]?\d+\.\d+)" reg = re.compile(regex1+regex2) with open(pdbfile) as o: columns = ('label', 'ident', 'atomName', 'residue', 'chain', 'sequence', 'x', 'y', 'z', 'occ', 'temp', 'element') data = [] for line in o: n = ma.match(line) if n: modelNum = n.group(1) m = reg.match(line) if m: d = dict(zip(columns, line.split())) d['model'] = int(modelNum) data.append(d) df = pd.DataFrame(data) df.to_csv(pdbfile[:-3]+'csv', header=True, sep='\t', mode='w') #Then I calculated the average coordinates import pandas as pd df = pd.read_csv('1abz.csv', delim_whitespace = True, usecols = [0,5,7,8,10,11,12]) df1 = df.groupby(['atomName','residue','sequence'],as_index=False)['x','y','z'].mean() df1.to_csv('avg_coord.csv', header=True, sep='\t', mode='w') 
+5
source share
1 answer

This is certainly feasible in biopitone. Let me help you with an example that ignores HETRES in the pdb file:

First, analyze the pdb file with all your models:

 import Bio.PDB import numpy as np parser = Bio.PDB.PDBParser(QUIET=True) # Don't show me warnings structure = parser.get_structure('1abz', '1abz.pdb') # id of pdb file and location 

So, now that we have the contents of the file, and suppose you have the same atoms in all your models, get a list with a unique identifier for each atom (for example: chain + residue pos + atom name):

 atoms = [a.parent.parent.id + '-' + str(a.parent.id[1]) + '-' + a.name for a in structure[0].get_atoms() if a.parent.id[0] == ' '] # obtained from model '0' 

Note that I ignore hetero-residual values ​​with a.parent.id[0] == ' ' . Now let's get the average value for each atom:

 atom_avgs = {} for atom in atoms: atom_avgs[atom] = [] for model in structure: atom_ = atom.split('-') coor = model[atom_[0]][int(atom_[1])][atom_[2]].coord atom_avgs[atom].append(coor) atom_avgs[atom] = sum(atom_avgs[atom]) / len(atom_avgs[atom]) # average 

Now create a new pdb using one structure model:

 ns = Bio.PDB.StructureBuilder.Structure('id=1baz') # new structure ns.add(structure[0]) # add model 0 for atom in ns[0].get_atoms(): chain = atom.parent.parent res = atom.parent if res.id[0] != ' ': chain.detach_child(res) # detach hetres else: coor = atom_avgs[chain.id + '-' + str(res.id[1]) + '-' + atom.name] atom.coord = coor 

Now write pdb

 io = Bio.PDB.PDBIO() io.set_structure(ns) io.save('new_1abz.pdb') 
+3
source

Source: https://habr.com/ru/post/1275947/


All Articles