Introduction to Genetic Algorithms

To begin with, I explained that I saw this question about the genetic algorithm , and it does not answer my question.

I am doing a project in bioinformatics. I have to take the data on the NMR spectrum of the cell (E. Coli) and find out which molecules (metabolites) are present in the cell.

For this, I am going to use genetic algorithms in the R language. I don’t have time to go through huge books on genetic algorithms. Heck! I don’t even have time to go through the little books (that’s what the related question doesn’t answer)

So, I need to know the resources that will help me quickly understand what genetic algorithms do and how they do it. I read the Wikipedia entry , this web page, and also several IEEE articles on this subject.

Any working code in R (even in C) or pointers to which R-modules will be used (if any) will be useful.

+4
source share
3 answers

A brief (and stubborn) introduction to genetic algorithms is at http://www.burns-stat.com/pages/Tutor/genetic.html

A simple GA written in R is available at http://www.burns-stat.com/pages/Freecode/genopt.R Documentation is in Poetry http://www.burns-stat.com/pages/ Spoetry / Spoetry.pdf and code.

+10
source

I assume that from your question you have some function F(metabolites) that gives spectrum , but you don't have the inverse function F'(spectrum) to return metabolites . The search space of metabolites is big, not brute force, you want to try an approximate method (like a genetic algorithm) that will make a more efficient random search.

To apply any such approximate method, you will need to define an evaluation function that compares the similarities between the target spectrum and the trial spectrum. The smoother the function, the better the search will work. If this can only lead to true / false, it will be a purely random search, and you will be better with brute force.

Given the function F and your assessment (aka fitness), you need to build a population of possible combinations of metabolites, run them through F , evaluate all the spectra obtained, and then use crossover and mutation to create a new population that brings together the best candidates. The choice of how to perform crossover and mutation, as a rule, depends on the domain, because you can significantly speed up the process, avoiding the creation of meaningless genomes. The best mutation rate will be very small, but it will also require settings for your domain.

Unaware of my domain, I can’t say what one member of your population should look like, but it can be just a list of metabolites (which allows you to organize and duplicate, if it’s interesting) or a string of logical values ​​for all possible metabolites (which has the advantage of that it is an invariant of order and provides obvious opportunities for crossover and mutation). A string has the disadvantage that it can be more expensive to filter out stupid genes (for example, it may not make sense to have only 1 metabolite or more than 1000). It’s faster to avoid creating nonsense, rather than just assigning it low suitability.

There are other approximate methods if you have F and your scoring function. The simplest is probably simulated annealing . Another that I have not tried is the Bee Algorithm , which seems to be a multi-stage simulation annealing with a fitness-weighted effort (a kind of intersection between SA and GA).

+4
source

I found the article “The Science of Computing: Genetic Algorithms," Peter J. Denning ( American Scientist, Volume 80, 1, pp. 12-14 ). This article is simple and useful if you want to understand what genetic algorithms do and read only 3 pages!

+1
source

Source: https://habr.com/ru/post/1388410/


All Articles