## Contents

## OpenMOLE Strategy to handle Stochastic Models Calibration ðŸ”—

Calibration of stochastic models leads to noisy fitness functions that may jeopardize Genetic Algorithm convergence. An efficient strategy to deal with such fitness functions is implemented in OpenMOLE. This strategy automatically balances the need for replications and the discovery of new solutions. In case you want to explore a stochastic model with a genetic algorithm you can do:```
val seed = Val[Long]
val evolution =
NSGA2Evolution(
mu = 100, // mu is the size of the population
genome = Seq(x in (0.0, 1.0), y in (0.0, 1.0)), // genome (of individuals) is the inputs prototype and their variation ranges
objectives = Seq(o1, o2), // objectives are the objectives to minimise
// OpenMOLEÂ provide a seed for your stochastic model to use (it is optional)
// 20% of the evaluations are used for replicating existing solutions
// 100 replication are stored at max for each individual
stochastic = Stochastic(seed = seed, reevaluate = 0.2, replications = 100),
evaluation = model,
termination = 100
)
```

## The problem of stochasticity in Model Calibration ðŸ”—

Genetic algorithms donâ€™t cope well with stochasticity.
This is especially the case for algorithms with evolution strategies of type "Âµ + Î»" (such as NSGA2, the GA used in OpenMOLE) which preserves best solutions (individuals) from a generation to another.
In that kind of optimization, the quality of a solution is only **estimated**. Since it is subject to variation from a replication to another, the quality can either be overvalued or
undervalued i.e. estimated at a significantly greater or lower value than the one obtained for an infinite
number of replications.

Undervalued solutions are not that problematic, as they might be discarded instead of being kept, but the algorithm has a chance to retry a very similar solution later on. Conversely, the overvalued solutions are very problematic : the genetic algorithm will keep them in the population of good solutions because they have been (falsely) evaluated as such, and will generate new offspring solutions from them.

This behaviour can greatly slow down the convergence of the calibration algorithm and even make it converge toward set of parameters producing very unstable dynamics, very likely to produce false positive good solutions.

### Existing solutions ðŸ”—

To reduce the influence of the fitness fluctuation, the simplest approach is "resampling". It consists in
replicating the fitness evaluation of individuals. The computed "quality" of an individual is then
an estimation (e.g. mean, median) based on a *finite* number of replications of the fitness computation.

This number is set to a compromise between the computation time taken to evaluate one set of parameters (an individual) and an acceptable level of noise for the computed quality. Still, any number of replications, even very high, implies that some solutions are overvalued with a non negligible probability given that the fitness function is evaluated millions of times.

Other ad hoc methods of the literature are based on some assumptions that are hard or impossible to verify (such as the invariance of the noise distribution over the fitness space) and add parameters to the algorithm that are difficult to tune finely.

See this paper for an extensive review of noisy fitness management in Evolutionary Computation.

### OpenMOLE's solution ðŸ”—

To overcome these limitations, OpenMOLE uses an auto-adaptive strategy called "stochastic resampling".

The idea is to evaluate individuals with only one replication and, at the next generation, to keep and re-inject a sample of the individuals of the current population in the newly created population..

For instance, at each generation, 90% of the individuals offspring genomes are

**new genomes**obtained by classical mutation/crossover steps of genetic algorithms, and 10% of the offspring genomes are drawn randomly from the current population i.e.

**already evaluated genomes**, for which the algorithm computes one additional replication. Replicated evaluations are stored for each individual in a vector of replications. The global fitness of an individual is computed using (for instance) the median of each fitness value stored in the replication vector.

This evolution strategy intends to have the best individuals survive several generations and therefore be the most likely to be resampled, since each individual has a fixed chance of being resampled at each generation.

However, this fixed probability of resampling is not sufficient alone, since well evaluated solutions are likely to be replaced by overvalued solutions (new solutions with a few "lucky" replications). So as to compensate this bias, we add a technical objective to NSGA2 : maximize the number of evaluations of a solution.

The optimization problem of model calibration becomes a

**multi-objective optimisation problem**(if it was not already !) : the algorithm has to optimize the objectives of the model

**and**the technical objective as well. Therefore, the number of replications is taken into account in the Pareto compromise elitism of NSGA2: solutions with many replications are kept, even if some solutions are better on the other objectives but have been evaluated fewer times. By doing so, we let the multi-objective optimization algorithm handle the compromise between the quality of the solutions and their robustness.

This method adds only two new parameters:

`reevaluate`

, the probability of resampling an individual at each generation`replications`

, the maximum number of evaluation values for an individual, to limit the memory used to store an individual.

`stochastic = Stochastic(seed = seed, reevaluate = 0.2, replications = 100)`

in the example.
This method has been implemented in the library for evolutionary computing: MGO, and is currently being benchmarked against other stochasticity management methods (see the repository)