Initial diversity of seed genetic algorithm using coding of C # values

Question

Initial diversity of seed genetic algorithm using coding of C # values

I would like to know the following: How to effectively do the initial generation of high diversity chromosomes using value coding? One way is to initialize the grid, but it is too slow.

So far, I have used the Random class from .NET to select random values in the encoding of values, but although the values are evenly distributed, the values of the functional functions calculated from such chromosomes are not. Here is the Chromosome initialization code:

public Chromosome(Random rand) { Alele = new List<double>(); for (int i = 0; i < ChromosomeLength; i++) { Alele.Add(rand.NextDouble() * 2000 - 1000); } }

So, I developed a function that calculates fitness from a new, randomly made chromosome (top code), and if fitness is similar to any other already included in the list of chromosomes, the new chromosome is randomly generated and its fitness is calculated, and this process is repeated until as long as his fitness is not different from those already on the list.

Here is the code for this part:

 private bool CheckSimilarFitnes(List<Chromosome> chromosome, Chromosome newCandidate) { Boolean flag=false; double fitFromList, fitFromCandidate; double fitBigger,fitSmaller; foreach (var listElement in chromosome) { fitFromList = listElement.CalculateChromosomeFitness(listElement.Alele); fitFromCandidate = newCandidate.CalculateChromosomeFitness(newCandidate.Alele); fitBigger = fitFromList >= fitFromCandidate ? fitFromList : fitFromCandidate; fitSmaller = fitFromList < fitFromCandidate ? fitFromList : fitFromCandidate; if ((fitFromList / fitFromCandidate) < 1.5) return false } else return true; }

But, the more chromosomes that I have on the list, the more time it takes to add a new one, with a sufficient degree of difference from others that already exist.

So, is there a way to make this mesh initialization faster, it takes several days to create 80 chromosomes?

+4

initialization c # algorithm grid genetic

Antun tun 21 sept '12 at 23:54

source share

5 answers

The main problem is that most randomly generated chromosomes have a similar physical shape, right? It's great; the idea is not that your initial chromosomes have completely different types of fitness; it is for the chromosomes themselves to be different, and apparently they are. In fact, you should expect that the initial suitability of most of your first generation will be close to zero, since you have not yet run the algorithm.

This is why your code is so slow. Let them say that the first candidate is terrible, mostly zero fitness. If the second should be 1.5 times, this actually means that it should be 1.5 times better, since it cannot really get worse. Then the next one should be 1.5 times better than this, etc. Up to 80. So you really do a search for more and more perfect chromosomes , generating completely random ones and comparing them with what you have. I bet if you have registered progress, you will find more and more time to find subsequent candidates, because really good chromosomes are hard to find. But finding the best chromosomes is what GA is for! Basically what you did was optimize some of the chromosomes manually before, um, actually optimizing them.

If you want your chromosomes to be diverse, compare their contents, do not compare their suitability. A comparison of suitability is working with algo.

+2

user24359 Sep 22 '12 at 1:02

source share

I'm going to do it fast, but Isaac is almost right. You must let GA do its job. You have a generation of people (chromosomes, whatever), and they are all on a fitness scale (or maybe they are all the same).

You pick some good ones to mutate (on their own) and crossover (with each other). You probably use the top 10% to generate another full population and drop the bottom 90%. Maybe you always keep the main guy around (elitism).

You repeat this for a while until your GA stops improving, because all people are very similar. You have very little diversity in your population.

What can help you: 1) make your mutations more effective, 2) find the best way to choose people for the mutation. In my commentary, I recommended AI Techniques for game programmers . This is a great book. Very easy to read.

To list several headings from a book, you can find the following things:

A selection of methods, such as Roulette Selection (on stackoveflow) (on wikipedia ) and Stochastic Universal Sampling , which determine how you choose your people. I always liked the choice of roulette. You set the probabilities that a person will be selected. This is not just a simple random sample of white noise.

I used this outside of GA to randomly select 4 letters from the Latin alphabet. I assigned a value from 0.0 to 1.0 for each letter. Each time the user (child) selects the letter correctly, I would reduce this value, say, 0.1. This would increase the likelihood that other letters would be selected. If after 10 times the user selected the correct letter, the value would be 0.0, and there would be an (almost) lack of the ability to re-represent the letter.

Fitness scaling , such as rank scaling, Sigma and Boltzmann Scaling scaling (pdf to ftp !!!) , which will allow you to change your original fitness values to adapt to your customized fitness values. Some of them are dynamic, such as Boltzmann Scaling, which allows you to set a “pressure” or “temperature” that changes over time. Increased "pressure" means that people-locksmiths are being selected. Pressure reduction means that you can choose any person in the population.

I think of it this way: you are looking for a multidimensional space for a solution. You get into the "peak" and work your way into it. The pressure to be adjusted is very high. You are hiding right at local highs. Now your fitness cannot change. Your mutations do not go out of peak. Thus, you begin to reduce the pressure and just, oh, choose elements at random. Your fitness levels are starting to drop, and that's fine. Then you again begin to increase pressure and wonder! You have exited local highs and found wonderful new local highs. Increase pressure again!

Niching (which I never used, but it seems to be a way to group similar people). Say you have two good people, but they are completely different. They keep getting out. They mutate a little and do not get much better. Now you have half of your population as minor options A, and half of your population minor version B. This seems like a way of saying, hey, what is the average fitness for the whole group A? and what for b? And what for every other niche you have. Then make your choice based on average availability for each niche. Choose your niche, then choose a random person from this niche. Maybe I'll start using this in the end. I like it!

Hope you find some of these useful!

+1

cod3monk3y Sep 22 '12 at 2:21

source share

If you need true random numbers for your application, I recommend you check out Random.org . They have a free HTTP API and clients for almost every language .

Randomness arises from atmospheric noise, which for many purposes is better than pseudo-random number algorithms commonly used in computer programs.

(I am not affiliated with Random.org, although I have contributed to the PHP client).

0

Jonathon reinhart Sep 22 '12 at 0:31

source share

I think your problem is how your fitness function is and how you choose candidates, and not how random the values are. Your filtering seems too strict, which does not even allow you to accept enough elements.

Example

values: random float 0-10000.
square root function function (n)
the desired suitability distribution is linear with a distance of at least 1.

With this fitness function, you will quickly get most of the 1st “spot” (as you have no more than 100 spots), so each next one will take longer. At some point, a few tiny ranges will remain, and most of the results will simply be rejected, even worse when you get about 50 numbers, there is a good chance that the next one simply will not be able to match.

0

Alexei Levenkov Sep 22 '12 at 1:09

source share

cod3monk3y · Accepted Answer · 2012-09-22T14:16:44+0000

here is some code that might help (which I just wrote): GA to order 10 values with an interval of 1.0 . It starts with a collection of 100 completely random alleles, and that is how your code begins.

The goal I gave the GA solution was to order the values in ascending order with a split of 1.0. He does this in the Eval_OrderedDistance fitness function by calculating the standard deviation of each pair of samples from 1.0. As fitness tends to 0.0, alleles must appear in sequential order.

Generation 0 is best The chromosome was completely random, like the rest of the chromosome. You can see that the value of fitness is very large (i.e., bad):

 GEN: fitness (allele, ...) 0: 375.47460 (583.640, -4.215, -78.418, 164.228, -243.982, -250.237, 354.559, 374.306, 709.859, 115.323)

As generations continue, fitness (standard deviation from 1.0) decreases until it becomes almost ideal in the 100,000 gene:

  100: 68.11683 (-154.818, -173.378, -170.846, -193.750, -198.722, -396.502, -464.710, -450.014, -422.194, -407.162) ... 10000: 6.01724 (-269.681, -267.947, -273.282, -281.582, -287.407, -293.622, -302.050, -307.582, -308.198, -308.648) ... 99999: 0.67262 (-294.746, -293.906, -293.114, -292.632, -292.596, -292.911, -292.808, -292.039, -291.112, -290.928)

The interesting parts of the code are the fitness function:

 // try to pack the aleles together spaced apart by 1.0 // returns the standard deviation of the samples from 1.0 static float Eval_OrderedDistance(Chromosome c) { float sum = 0; int n = c.alele.Length; for(int i=1; i<n; i++) { float diff = (c.alele[i] - c.alele[i-1]) - 1.0f; sum += diff*diff; // variance from 1.0 } return (float)Math.Sqrt(sum/n); }

And mutations. I used a simple crossover and "completely mutated one allele":

 Chromosome ChangeOne(Chromosome c) { Chromosome d = c.Clone(); int i = rand.Next() % d.alele.Length; d.alele[i] = (float)(rand.NextDouble()*2000-1000); return d; }

I used elitism to always keep one exact copy of the best chromosome. Then 100 new chromosomes were generated using mutation and crossover.

It really sounds like you are calculating the difference in fitness, which of course tells you that fitness in your population is about the same. I found it very important how you define your fitness function. The more granular the fitness function, the more you can distinguish your chromosomes. Obviously, your fitness function returns similar values for completely different chromosomes , since your gene 0 returns the fitness variance of 68e-19.

Can you talk about your fitness account? Or what problem do you ask the GA to solve? I think this will help us.

[Edit: Add explicit sharing / Niching]

I changed my mind a bit and updated my code . If you are trying to preserve unique chromosomes, you need to compare their contents (as others have pointed out). One way to do this is to calculate the standard deviation between them. If it is less than a certain threshold, you can consider them the same. From the Chromosome class:

 // compute the population standard deviation public float StdDev(Chromosome other) { float sum = 0.0f; for(int i=0; i<alele.Length; i++) { float diff = other.alele[i] - alele[i]; sum += diff*diff; } return (float)Math.Sqrt(sum); }

I think Nichin will give you what you want. He compares all chromosomes in a population to determine their similarity and assigns each a niche value. Then the chromosomes are “punished” for belonging to a niche using the “Explicit Fitness Access” method. Suitability values are divided by the number of chromosomes in each niche. Therefore, if you have three in the niche group A (A, A, A), instead of making this niche 3 times more likely, it should be considered as a whole.

I compared my example with an explicit separation of skills on / off. With a maximum STDDEV of 500 and Niching turned off, there were about 18-20 niches (so basically 5 duplicates of each item in 100 people). When Niching turned on, there were about 85 niches. This is 85% of the unique chromosomes in the population. In the output of my test you can see the diversity after 17,000 generations .

Here's the niching code:

 // returns: total number of niches in this population // max_stddev -- any two chromosomes with population stddev less than this max // will be grouped together int ComputeNiches(float max_stddev) { List<int> niches = new List<int>(); // clear niches foreach(var c in population) { c.niche = -1; } // calculate niches for(int i=0; i<population.Count; i++) { var c = population[i]; if( c.niche != -1) continue; // niche already set // compute the niche by finding the stddev between the two chromosomes c.niche = niches.Count; int count_in_niche = 1; // includes the curent Chromosome for(int j=i+1; j<population.Count; j++) { var d = population[j]; float stddev = c.StdDev(d); if(stddev < max_stddev) { d.niche = c.niche; // same niche ++count_in_niche; } } niches.Add(count_in_niche); } // penalize Chromosomes by their niche size foreach(var c in population) { c.niche_scaled_fitness = c.scaled_fitness / niches[c.niche]; } return niches.Count; }

[Edit: post-analysis and updating Anton code]

I know that this is probably not the right forum for solving home problems, but since I did it before knowing it, and I was very interested in doing it, I believe that this can be useful only for Anton.

Genotip.cs , Kromosom.cs , KromoMain.cs

This code supports a good variety, and in one pass I was able to get a "raw food diet" of up to 47, which in your case is a standard error. It was pretty close!

As noted in my comment, I would like to try to help you with programming, and not just help you with your homework. Read this analysis of your work.

As expected, from the very beginning it was not necessary to create a “more diverse” population. Just generate completely random cromosomes.
Your mutations and crossovers were very destructive, and you only had a few of them. I have added some new operators that seem to work better for this problem.
You have chosen the best solution. When I get your code that works only with the choice of the tournament, there will be one Kromo, which is 99% better than everyone else. With a tournament choice, this best value is likely to be forgotten. I added a bit of “elitism” that keeps a copy of this meaning for the next generation.
Consider object oriented methods. Compare the rewriting I sent you with my source code.
Do not duplicate the code. You had sampling options in two different classes.
Keep your code clean. There were several unused pieces of code. Especially when submitting questions to SO, try narrowing it down, removing unused code, and doing some cleanup.
Comment your code! I commented on the re-work significantly. I know this is Serbian, but even a few comments will help someone understand what you are doing and what you intended to do.
All in all, a good job implementing some of the more complex things like Tournament Selection
Prefer double [] arrays over list. There is less overhead. In addition, some of the Temp List variables are not needed. Your structure
List temp = new List (); for (...) {temp.add (value); } for (each value in temp) {sum + = value} average = sum / temp.Count

easy to write as:

 sum = 0 for(...) { sum += value; } average = sum / count;

In several places, you forgot to initialize a loop variable that could easily add to your problem. Something like this would cause serious problems, and that was in the fitness code along with one or two other places
double fit = 0; for (each chromosome) {// YOU SHOULD INITIALIZE HERE inside the LOOP for (each allele) {fit + = ...; } fit / = count; }

Successful programming!

Initial diversity of seed genetic algorithm using coding of C # values

More articles: