# How to use Global Optimization Toolbox algorithms for SimBiology parameter estimation – in parallel (part II)

In the first part I presented a workflow to export a SimBiology
model as well as data to the workspace from the SimBiology Desktop,
speed up its simulation and prepare it for parallel computation. Here I
will set up the actual optimization problem.

The Optimization solvers usually only accept a single parameter in the
objective function. Three methods can be used to call the objective
function with more parameters – anonymous functions, nested functions
and global variables. I usually use nested functions for these purposes,
however, as this only works inside a function and I would like to have a
script at the end of this blog post, I will call an external objective
function `objFunction`

via an anonymous function. Here’s what
`objFunction`

looks like:

```
function residual = objFunction(p, exportedMdlObj, data, mdlObsSpeciesName)
sd = simulate(exportedMdlObj, p);
sdn = resample(sd,data.t,'pchip'); % resample simulation data to get the timepoints at the observed time points
[~,x,names] = selectbyname(sdn, mdlObsSpeciesName);
residual = x-double(data(:,2:end)); % calculate residual
residual=residual(:);
```

When we exported the model we used some model parameters as a second
argument. The `simulate`

method of the exported model object accepts a
vector of new parameter values that it uses for the simulation. The
output of this method is a `SimData`

object which holds the simulation
results and some metainformation. `SimData`

has a method which allows
resampling or interpolation of the simulation results so that we can get
these at the same time as the experimental data. `resample`

includes a
number of different interpolation methods such as nearest neighbour,
linear or spline interpolation. I tend to use the `'pchip'`

option as it
preserves the shape of the generating function. Cleve Moler, the
original developer of MATLAB, wrote an interesting blog post about the
differences between `'spline'`

and `'pchip'`

a while back (see
here).

Last, we can extract the simulation results for those species that were
measured by using the `selectbyname`

method with a cell array of model
species names. The last step in the objective function is to calculate
the residuals by simply subtracting the experimental data from the
simulation results. Use the `double`

method to cast a dataset array to a
double array.

As a side note, this objective function is adapted to the
`lsqnonlin`

function which requires a vector of the residuals as the
input. Other functions such as `fmincon`

work on scalars; in this case
you need to calculate the least squares term by yourself.

Going back to the script, we now set up the objective function needed
for the optimization as an anonymous function

```
objFcn = @(p)objFunction(p, exportedMdlObj, data, mdlObsSpeciesName);
```

and can move on to the last part, setting up the MultiStart algorithm.
The `MultiStart`

function starts a local solver from a large number of
different starting points. You can either let `MultiStart`

choose the
starting points or specify starting points manually. Here, I will
specify `nruns = 50`

starting points that are log-uniformly distributed
an order of magnitude below and above the initial estimates that we
specified:

```
nruns = 50;
startPointMatrix = repmat(initialEstimate,[nruns,1]);
startPointMatrix(2:end,:) = startPointMatrix(2:end,:).* 10.^(2*rand(nruns-1,numel(initialEstimate))-1);
```

The `MultiStart`

function requires a custom start point set as an input
which we can generate using the CustomStartPointSet constructor:

```
startPoints = CustomStartPointSet(startPointMatrix);
```

Next we create the optimization problem

```
optimProblem = createOptimProblem('lsqnonlin','lb',zeros(size(initialEstimate)),'objective', objFcn,'x0',initialEstimate);
```

where we specify `lsqnonlin`

as the solver to be used, zeros as lower
bounds, the objective function and provide an initial estimate. The
`MultiStart`

function needs to be configured as well

```
ms = MultiStart('Display','iter','UseParallel','always');
```

Here’s where the parallel option comes into place. As `MultiStart`

runs
independent optimizations from different starting points, it is a
perfect fit for parallelization. If we would have used `sbiosimulate`

to
simulate our model, the function would have errored as it cannot run in
parallel. This could be circumvented by exporting the model first (see
above).

Finally, we open a `matlabpool`

of four workers (on my quadcore
machine, if you have access to a cluster and the MATLAB Distributed
Computing Server you can run it on many more workers):

```
matlabpool open 4
```

and start the fitting

```
[finalEstimate, fval] = run(ms, optimProblem,startPoints);
```

Depending on the number of runs and the complexity of the problem, this will take a while eventually, but in the end, the solution might be much better than the initial outcome.

How to get the new estimates back into the SimBiology Desktop? I started
out by saying that the command line changes to the model will be
synchronized in the desktop if we used the *Export Model to the
workspace* functionality in the desktop. I will add the new estimates
as a new variant to the model by first creating a cell array containing
the parameter names and new values and then add this content to the
model:

```
for i=1:numel(finalEstimate)
varcontent{i} = {'parameter',estParameterNames{i},'Value',finalEstimate(i)};
end
varObj = addvariant(mdlObj, 'multiStartFit');
addcontent(varObj, varcontent);
```

The new variant will show up in the variant tab in the SimBiology Desktop and we can explore and analyze this solution.

*Any comments/questions? Send me an email.*