Sven Mesecke's blog

Posted Mo 18 Februar 2013

How to use Global Optimization Toolbox algorithms for SimBiology parameter estimation – in parallel (part II)

In the first part I presented a workflow to export a SimBiology model as well as data to the workspace from the SimBiology Desktop, speed up its simulation and prepare it for parallel computation. Here I will set up the actual optimization problem.
The Optimization solvers usually only accept a single parameter in the objective function. Three methods can be used to call the objective function with more parameters – anonymous functions, nested functions and global variables. I usually use nested functions for these purposes, however, as this only works inside a function and I would like to have a script at the end of this blog post, I will call an external objective function objFunctionvia an anonymous function. Here’s what objFunctionlooks like:

function residual = objFunction(p, exportedMdlObj, data, mdlObsSpeciesName)

sd = simulate(exportedMdlObj, p);
sdn = resample(sd,data.t,'pchip'); % resample simulation data to get the timepoints at the observed time points
[~,x,names] = selectbyname(sdn, mdlObsSpeciesName);

residual = x-double(data(:,2:end)); % calculate residual

When we exported the model we used some model parameters as a second argument. The simulatemethod of the exported model object accepts a vector of new parameter values that it uses for the simulation. The output of this method is a SimDataobject which holds the simulation results and some metainformation. SimDatahas a method which allows resampling or interpolation of the simulation results so that we can get these at the same time as the experimental data. resampleincludes a number of different interpolation methods such as nearest neighbour, linear or spline interpolation. I tend to use the 'pchip' option as it preserves the shape of the generating function. Cleve Moler, the original developer of MATLAB, wrote an interesting blog post about the differences between 'spline' and 'pchip' a while back (see here).

Last, we can extract the simulation results for those species that were measured by using the selectbynamemethod with a cell array of model species names. The last step in the objective function is to calculate the residuals by simply subtracting the experimental data from the simulation results. Use the doublemethod to cast a dataset array to a double array.
As a side note, this objective function is adapted to the lsqnonlinfunction which requires a vector of the residuals as the input. Other functions such as fminconwork on scalars; in this case you need to calculate the least squares term by yourself.
Going back to the script, we now set up the objective function needed for the optimization as an anonymous function

objFcn = @(p)objFunction(p, exportedMdlObj, data, mdlObsSpeciesName);

and can move on to the last part, setting up the MultiStart algorithm. The MultiStartfunction starts a local solver from a large number of different starting points. You can either let MultiStartchoose the starting points or specify starting points manually. Here, I will specify nruns = 50 starting points that are log-uniformly distributed an order of magnitude below and above the initial estimates that we specified:

nruns = 50;
startPointMatrix = repmat(initialEstimate,[nruns,1]);
startPointMatrix(2:end,:) = startPointMatrix(2:end,:).* 10.^(2*rand(nruns-1,numel(initialEstimate))-1);

The MultiStartfunction requires a custom start point set as an input which we can generate using the CustomStartPointSet constructor:

startPoints = CustomStartPointSet(startPointMatrix);

Next we create the optimization problem

optimProblem = createOptimProblem('lsqnonlin','lb',zeros(size(initialEstimate)),'objective', objFcn,'x0',initialEstimate);

where we specify lsqnonlinas the solver to be used, zeros as lower bounds, the objective function and provide an initial estimate. The MultiStartfunction needs to be configured as well

ms = MultiStart('Display','iter','UseParallel','always');

Here’s where the parallel option comes into place. As MultiStartruns independent optimizations from different starting points, it is a perfect fit for parallelization. If we would have used sbiosimulateto simulate our model, the function would have errored as it cannot run in parallel. This could be circumvented by exporting the model first (see above).
Finally, we open a matlabpoolof four workers (on my quadcore machine, if you have access to a cluster and the MATLAB Distributed Computing Server you can run it on many more workers):

matlabpool open 4

and start the fitting

[finalEstimate, fval] = run(ms, optimProblem,startPoints);

Depending on the number of runs and the complexity of the problem, this will take a while eventually, but in the end, the solution might be much better than the initial outcome.

How to get the new estimates back into the SimBiology Desktop? I started out by saying that the command line changes to the model will be synchronized in the desktop if we used the Export Model to the workspace functionality in the desktop. I will add the new estimates as a new variant to the model by first creating a cell array containing the parameter names and new values and then add this content to the model:

for i=1:numel(finalEstimate)
varcontent{i} = {'parameter',estParameterNames{i},'Value',finalEstimate(i)};
varObj = addvariant(mdlObj, 'multiStartFit');
addcontent(varObj, varcontent);

The new variant will show up in the variant tab in the SimBiology Desktop and we can explore and analyze this solution.

Any comments/questions? Send me an email.

Category: Scientific Computing
Tags: parameter estimation matlab simBiology tutorial