Sven Mesecke's blog

Posted Do 19 September 2013

Julia for MATLAB users III

Previous articles explored julia and its plotting facilities in the Winston package. The workflow used there, (i) data import, (ii) plotting and (iii) graphics export, will be used in this article as well, however, this time using Dataframes and Gadfly in julia and dataset arrays in MATLAB.

Dataset arrays are a rather new and unknown addition to MATLAB, mostly because they were only available in the statistics toolbox (up until release R2013b when table was added as a new datatype to core MATLAB).

% create a dataset array with similar properties to 
% DataFrames in R or Python
data = dataset('File','rlbinding.csv',...
'Delimiter',',','ReadVarNames',false)
% rename the column headers
data.Properties.VarNames = {'time','R','L','RL'}

% unfortunately, dataset arrays are 
% not really an integrated part of matlab
% hence plotting needs to be done as before
figure('Position',[350 350 800 400])
plot(t,R,'-bd',t,L,'-bo',t,RL,'-bx','MarkerSize',8)
legend({'R','L','RL'})
xlabel('time')
ylabel('concentration')

export_fig('matlab_rlbinding.png',...
'-png', '-r72','-transparent')

creating the same plot as before:

Yet another MATLAB plot

The only difference to plotting it using standard arrays are the slightly different input arguments - you basically have to extract the data first from the dataset array into a double array before plotting it. See the previous article for how to do this.

Using julia for this is quite illustrative (and a really nice experience). You will need to install the DataFrames and Gadfly packages, though the former (as well as a bunch of other dependencies) should be automatically installed when installing Gadfly. Gadfly follows a completely different philosophy from MATLAB and Winston and is strongly influenced by ggplot in R. I struggled creating a plot overlaying several matrix columns, however, that was mostly due to my misconception of how Gadfly should work. In the end I created a stacked data set where time series data are stacked on top of each other and then grouped by reading in the original data and then shuffling columns around.

using Gadfly
using DataFrames
# get the file names in the current folder
files = readdir()
# read the data into dataframe - named 
# function arguments, yeah!
data = readtable(files[2],separator=',',header=false)
# rename the column headers - ! at the 
# end of function names indicates that 
# the inputs will be modified by the function (though
# this is only based on mutual agreement, not an 
# in-built language feature) 
colnames!(data,["time","R","L","RL"])

Let's transform the data so that it conforms with Gadfly's way of thinking

sdata = DataFrame() # empty data frame
t = data["time"] # one-dimensional vectors!
ntime = size(t)
sdata["time"] = [t, t, t] # stack time vectors
sdata["conc"] = [data["R"], data["L"], data["RL"]] # stack data
# create a grouping variable
sdata["state"] = [fill("R", ntime), fill("L", ntime), fill("RL", ntime)]

Now the data is imported and easily accessible; the next step is to do the actual plotting using the Gadfly package. In contrast to MATLAB, DataFrames are the building blocks of Gadfly.

# Gadfly expects a data frame as the first input argument
# then a mapping of axes to columns and the type of plot to display
p = plot(sdata, x="time", y="conc", color="state",Geom.line)
fname = "rlbinding_gadfly.png" # the name of the final file
# draw to the PNG backbone (or use SVG, D3,...)
draw(PNG(fname, 600px, 400px), p)

You can display this directly in the ijulia notebook interface when you are using the D3 drawing backbone instead of PNG. However, if you are using the REPL that comes with julia, you can only save the figure, not display it. Here the backtick notation for running shell programs adopted from perl and others comes in handy - simply run open in the terminal on the plot file

run(`open $fname`) # variables can be referenced using `$`

and the application associated with the file type will open on a Mac. Great stuff.

The final plot looks good - but customization seems to be sorely lacking as of now.

The Gadfly plot

Let's try the subplot approach and plot the different states in horizontally stacked plots,

hfname="rlbinding_gadfly_subplot.png"
# use hstack for horizontally stacking plots,
# vstack for vertical stacks
hp = hstack(plot(data, x="time", y="R",Geom.line), plot(data,x="time",y="L",Geom.line), plot(data,x="time",y="RL",Geom.line))
draw(PNG(hfname, 600px, 300px), hp)
run(`open $hfname`)

which creates another great-looking plot

The Gadfly "sub"plot

Any comments/questions? Send me an email.

Category: Scientific Computing
Tags: matlab julia data analysis data visualization