How to use Global Optimization Toolbox algorithms for SimBiology parameter estimation – in parallel (part II)
In the first part I presented a workflow to export a SimBiology
model as well as data to the workspace from the SimBiology Desktop,
speed up its simulation and prepare it for parallel computation. Here I
will set up the actual optimization problem.
The Optimization solvers usually only accept a single parameter in the
objective function. Three methods can be used to call the objective
function with more parameters – anonymous functions, nested functions
and global variables. I usually use nested functions for these purposes,
however, as this only works inside a function and I would like to have a
script at the end of this blog post, I will call an external objective
function objFunction
via an anonymous function. Here’s what
objFunction
looks like:
function residual = objFunction(p, exportedMdlObj, data, mdlObsSpeciesName)
sd = simulate(exportedMdlObj, p);
sdn = resample(sd,data.t,'pchip'); % resample simulation data to get the timepoints at the observed time points
[~,x,names] = selectbyname(sdn, mdlObsSpeciesName);
residual = x-double(data(:,2:end)); % calculate residual
residual=residual(:);
When we exported the model we used some model parameters as a second
argument. The simulate
method of the exported model object accepts a
vector of new parameter values that it uses for the simulation. The
output of this method is a SimData
object which holds the simulation
results and some metainformation. SimData
has a method which allows
resampling or interpolation of the simulation results so that we can get
these at the same time as the experimental data. resample
includes a
number of different interpolation methods such as nearest neighbour,
linear or spline interpolation. I tend to use the 'pchip'
option as it
preserves the shape of the generating function. Cleve Moler, the
original developer of MATLAB, wrote an interesting blog post about the
differences between 'spline'
and 'pchip'
a while back (see
here).
Last, we can extract the simulation results for those species that were
measured by using the selectbyname
method with a cell array of model
species names. The last step in the objective function is to calculate
the residuals by simply subtracting the experimental data from the
simulation results. Use the double
method to cast a dataset array to a
double array.
As a side note, this objective function is adapted to the
lsqnonlin
function which requires a vector of the residuals as the
input. Other functions such as fmincon
work on scalars; in this case
you need to calculate the least squares term by yourself.
Going back to the script, we now set up the objective function needed
for the optimization as an anonymous function
objFcn = @(p)objFunction(p, exportedMdlObj, data, mdlObsSpeciesName);
and can move on to the last part, setting up the MultiStart algorithm.
The MultiStart
function starts a local solver from a large number of
different starting points. You can either let MultiStart
choose the
starting points or specify starting points manually. Here, I will
specify nruns = 50
starting points that are log-uniformly distributed
an order of magnitude below and above the initial estimates that we
specified:
nruns = 50;
startPointMatrix = repmat(initialEstimate,[nruns,1]);
startPointMatrix(2:end,:) = startPointMatrix(2:end,:).* 10.^(2*rand(nruns-1,numel(initialEstimate))-1);
The MultiStart
function requires a custom start point set as an input
which we can generate using the CustomStartPointSet constructor:
startPoints = CustomStartPointSet(startPointMatrix);
Next we create the optimization problem
optimProblem = createOptimProblem('lsqnonlin','lb',zeros(size(initialEstimate)),'objective', objFcn,'x0',initialEstimate);
where we specify lsqnonlin
as the solver to be used, zeros as lower
bounds, the objective function and provide an initial estimate. The
MultiStart
function needs to be configured as well
ms = MultiStart('Display','iter','UseParallel','always');
Here’s where the parallel option comes into place. As MultiStart
runs
independent optimizations from different starting points, it is a
perfect fit for parallelization. If we would have used sbiosimulate
to
simulate our model, the function would have errored as it cannot run in
parallel. This could be circumvented by exporting the model first (see
above).
Finally, we open a matlabpool
of four workers (on my quadcore
machine, if you have access to a cluster and the MATLAB Distributed
Computing Server you can run it on many more workers):
matlabpool open 4
and start the fitting
[finalEstimate, fval] = run(ms, optimProblem,startPoints);
Depending on the number of runs and the complexity of the problem, this will take a while eventually, but in the end, the solution might be much better than the initial outcome.
How to get the new estimates back into the SimBiology Desktop? I started out by saying that the command line changes to the model will be synchronized in the desktop if we used the Export Model to the workspace functionality in the desktop. I will add the new estimates as a new variant to the model by first creating a cell array containing the parameter names and new values and then add this content to the model:
for i=1:numel(finalEstimate)
varcontent{i} = {'parameter',estParameterNames{i},'Value',finalEstimate(i)};
end
varObj = addvariant(mdlObj, 'multiStartFit');
addcontent(varObj, varcontent);
The new variant will show up in the variant tab in the SimBiology Desktop and we can explore and analyze this solution.
Any comments/questions? Send me an email.