  Explore with Calibration

## Content:

1 - Single criterion Calibration
2 - Multi criteria Calibration
3 - Calibration within OpenMOLE

Using Genetic Algorithms (GA), OpenMOLE finds the input set matching one or several criteria: this is called calibration. In practice, calibration is used to target one specific scenario or dynamic. Usually, a fitness function is used to assess the distance between obtained dynamics and your target dynamic. In case your model is not able to match the target dynamic, the calibration will find the parameterization producing the closest (according to your fitness function) possible dynamic. For more details on calibration using genetic algorithms, see the GA detailed page.

## Single criterion Calibration 🔗

### Method's score 🔗

The single criterion calibration method is designed to solve an optimization problem, so unsurprisingly it performs well regarding the optimization grade. Since it is only focused towards discovering the best performing individual (parameter set), this method doesn't provide insights about the model sensitivity regarding its input parameters, as it does not keep full records of the past input samplings leading to the optimal solution.
For the same reason, this method is not intended to cover the entirety of the input and output spaces, and thus does not perform well regarding the input and output exploration grades. It concentrates the sampling of the input space towards the part which minimizes the fitness, and therefore intentionally neglects the part of the input space leading to high fitness values. Calibration can handle stochasticity, using a specific method.
The dimensionality of the model input space is generally not an issue for this method, as it can handle up to a hundred dimensions in some cases. However, the number of objectives (output dimensionality) should be limited to a maximum of 2 or 3 compromise objectives.

Single criterion calibration answers the following question: for a given target value of the output o1, what is(are) the parameter set(s) (i, j , k) producing the output value(s) closest to the target?

## Multi criteria Calibration 🔗

### Method's score 🔗

The multi criteria calibration method slightly differs from the single criterion version. It suffers the same limitations regarding input and output spaces. However, since it may reveal a Pareto frontier and the underlying trade-off, it reveals a little bit of the model sensitivity, showing that the model performance regarding a criterion is impacted by its performances regarding the others. This is not genuine sensitivity as in sensitivity analysis, but still, it outlines a variation of your model outputs, which is not bad after all!

Multi criteria calibration answers the following question: for a given target pattern (o1,o2), what are the parameter sets (i,j) that produce the closest output values to the target pattern ?
Sometimes a Pareto Frontier may appear!

### Differences between single and multi criteria calibration 🔗

Calibration boils down to minimizing a distance measure between the model output and some data. When there is only one distance measure considered, it is single criterion calibration. When there are more than one distance that matter, it is multi-criteria calibration.
For example, one may study a prey-predator model and want to find parameter values for which the model reproduces some expected size of both the prey and predator populations.

The single criterion case is simpler, because we can always tell which distance is smaller between any two distances. Thus, we can always select the best set of parameter values.
In the multi criteria case, it may not always be possible to tell which simulation output has the smallest distance to the data. For example, consider a pair (d1, d2) representing the differences between the model output and the data for the prey population size (d1) and the predator population size (d2). Two pairs such as (10, 50) and (50, 10) each have one element smaller than the other and one bigger. There is no natural way to tell which pair represents the smallest distance between the model and the data. Thus, in the multi-criteria case, we keep all the parameter sets (e.g. {(i1, j1, k1), (i2, j2, k2), ...}) yielding such equivalent distances (e.g. {(d11, d21), (d12, d22), ...}). The set of all these parameter sets is called the Pareto front.

## Calibration within OpenMOLE 🔗

### Specific constructor 🔗

Single and multi-criteria calibration in OpenMOLE are both done with the NSGA2 algorithm, through the `NSGA2Evolution` constructor. It takes the following parameters:
• `evaluation`: the OpenMOLE task that runs the simulation (i.e. the model),
• `objective`: a list of the distance measures (which in the single criterion case will contain only one measure),
• `populationSize`: the population size,
• `genome`: a list of the model parameters and their respective variation intervals,
• `termination`: the total number of evaluations (execution of the task passed to the parameter \"evaluation\") to be executed,
• `parallelism`: optional, the number of simulations that will be run in parallel, defaults to 1,
• `stochastic`: optional, the seed provider, mandatory if your model contains randomness,
• `distribution`: optional, computation distribution strategy, default is \"SteadyState\".
• `reject`: optional, a predicate which is true for genomes that must be rejected by the genome sampler (for instance \"i1 > 50\").
Where `param1`, `param2`, `param3` and `param4` are inputs of the `modelTask`, and `distance1` and `distance2` are its outputs.

More details and advanced notions can be found on the GA detailed page.

### Hook 🔗

The output of the genetic algorithm must be captured with a hook which saves the current optimal population. The generic way to use it is to write either `hook(workDirectory / "path/of/a/directory")` to save the results as CSV files in a specific directory, or `hook display` to display the results in the standard output.
The hook arguments which can be specified to the `NSGA2Evolution` constructor are:
• `output`: the directory in which to store the population files,
• `frequency`: optional, Long, the frequency at which the generations should be saved.
For more details about hooks, check the corresponding Language page.

### Example 🔗

In your OpenMOLE script, the NSGA2 algorithm scheme is defined like so:

``````val param1 = Val[Double]
val param2 = Val[Double]
val param3 = Val[Int]
val param4 = Val[String]

val distance1 = Val[Double]
val distance2 = Val[Double]

NSGA2Evolution(
objective = Seq(distance1, distance2),
populationSize = 200,
genome = Seq(
param1 in (0.0, 99.0),
param2 in (0.0, 99.0),
param3 in (0, 5),
param4 in List("apple", "banana", "strawberry")),
termination = 100,
parallelism = 10
) hook (workDirectory / "path/to/a/directory")
``````

### Calibrating a high number of inputs 🔗

If you want to calibrate an important number of parameters you can use arrays directly in the genome. For that you must provide an array of boundaries. In this example, the an array of 100 inputs varying between 0 an 100 an a single double value varying from 1 to 10.
``````val param1 = Val[Array[Double]]
val param2 = Val[Double]

val distance = Val[Double]

val model = ScalaTask("val distance = param1.sum * param2") set (inputs += (param1, param2), outputs += distance)

NSGA2Evolution(
evaluation = model ,
objective = distance,
populationSize = 200,
genome = Seq(
param1 in Array.fill(100)(0.0, 99.0),
param2 in (1.0, 10.0)),
termination = 100,
parallelism = 10
) hook display
``````