Divide the database of mol2 molecules into N smaller sets

I got a large set of molecules from the zinc database ( http://zinc.docking.org/ ), in mol2 ( http://tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 ) I would like to be able to split this database into an arbitrary set of N smaller databases. What is the best script for python, bash or perl for this? I read about openbabel, but it can only generate sets of individual molecules.

If not, I can also convert mol2 to another more convenient format

Thaks

+3
source share
2 answers

csplit can separate a file from individual molecules:

csplit ~/Download/zinc.mol2 '/@<TRIPOS>MOLECULE/' '{*}'

- , , , , .

+2

linux:

gawk -v RS="@<TRIPOS>MOLECULE" 'NF{ print RS$0 > "zinc"++n".mol2" }' zinc.mol2
0

Source: https://habr.com/ru/post/1726877/


All Articles