I assume that from your question you have some function F(metabolites) that gives spectrum , but you don't have the inverse function F'(spectrum) to return metabolites . The search space of metabolites is big, not brute force, you want to try an approximate method (like a genetic algorithm) that will make a more efficient random search.
To apply any such approximate method, you will need to define an evaluation function that compares the similarities between the target spectrum and the trial spectrum. The smoother the function, the better the search will work. If this can only lead to true / false, it will be a purely random search, and you will be better with brute force.
Given the function F and your assessment (aka fitness), you need to build a population of possible combinations of metabolites, run them through F , evaluate all the spectra obtained, and then use crossover and mutation to create a new population that brings together the best candidates. The choice of how to perform crossover and mutation, as a rule, depends on the domain, because you can significantly speed up the process, avoiding the creation of meaningless genomes. The best mutation rate will be very small, but it will also require settings for your domain.
Unaware of my domain, I can’t say what one member of your population should look like, but it can be just a list of metabolites (which allows you to organize and duplicate, if it’s interesting) or a string of logical values ​​for all possible metabolites (which has the advantage of that it is an invariant of order and provides obvious opportunities for crossover and mutation). A string has the disadvantage that it can be more expensive to filter out stupid genes (for example, it may not make sense to have only 1 metabolite or more than 1000). It’s faster to avoid creating nonsense, rather than just assigning it low suitability.
There are other approximate methods if you have F and your scoring function. The simplest is probably simulated annealing . Another that I have not tried is the Bee Algorithm , which seems to be a multi-stage simulation annealing with a fitness-weighted effort (a kind of intersection between SA and GA).
source share