Sven Mesecke's blog

Posted Mi 10 Juli 2013

Setting up a Mac for Scientific Computing and Software Development II

Part i covered the preparations for installing software development and scientific computing tools: Editors, the Terminal, Mac OS X Package Managers and some bash basics. In part ii we will dive into setting up a Python environment and all tools necessary for blogging and scientific computing. Part iii will decribe MATLAB, OpenCV and mexopencv.

Python

If you consider using Python and/or MATLAB on a Mac for Scientific Computing (or Data Analysis and Visualization, Statistics etc.), then you have two alternative options - using a Python distribution or installing it all manually. A notable example for a good Python distribution is the Enthought Python Distribution, which comes in both free and non-free versions. Another one of interest is Sage, which combines a number of different open source tools based on many different languages into a common Python-based interface. Last, Continuum Analytics, a company contributing a lot of open source tools to the Python stack, offers the Anaconda distribution, which I have not tested yet.

The second option is to build everything from scratch, which has the advantage that control and organization of the code is much easier to accomplish, especially if you want to use scientific computing packages also in other environments such as website development. A Python stack useful for scientific computing typically consists of a number of packages that require each other. For example, scipy, a package containing functions for interpolation, optimization and statistics, among many others, depends on numpy and matplotlib for plotting and also requires a Fortran compiler to be installed on the machine. A starting stack of tools could contain

  1. numpy, which provides arrays and other features commonly needed in scientific computing
  2. matplotlib, a 2D plotting library
  3. scipy, a library containing plenty of numerical solvers
  4. ipython, an interactive shell and browser-based notebook for Read-Evaluate-Print Loops (REPL)
  5. pandas, a library providing a data type called DataFrame similar to R's Data Frames, among other things
  6. statsModels, a library that provides statistical computing functionality
  7. scikit-learn, python's machine learning library
  8. sympy a library for symbolic math in python
  9. Spyder a GUI providing functionalities similar to the MATLAB Desktop

Installing this stack of libraries requires some prior work, however. The Mac OS X version of Python is an old one and not easy or straightforward to update. We can use the previously installed Homebrew to get a separate and clean Python installation. When installing Python via Homebrew, I had some issues with installing other Python packages afterwards via pip, as the OpenSSL version coming with Mac OS X 10.8.3 was really old or rather obsolete. It is good practice to install Python with the openSSL coming with Homebrew (see here for more info)

brew install python --with-brewed-openssl

which installs the latest Python on the 2.x branch. You can do the same to install Python 3.x:

brew install python3 --with-brewed-openssl

The advantage of using Homebrew is that you always get the latest Python versions and that you can use the pip included in Homebrew to install other packages (such as the great virtualenv and virtualenvwrapper tools):

pip install virtualenv
pip install virtualenvwrapper

So what are these two packages for?

Python tools for static blog generation

They modify your system so that you can use different Python package stacks simultaneously. Say you want to use scipy and the associated tools for your scientific work. scipy requires Python 2.x. At the same time you maybe would like to use Pelican as a static blog generator (highly recommended, this blog is running on pelican). Pelican does work with Python 3.x. To separate these two environments, create separate virtual environments with their own local Python and associated package installations. For example, I have set up a Pelican environment using

mkproject Pelican

Please read the virtualenvwrapper documentation to understand its API. Your prompt will change accordingly and will put the name of the active environment in front of it. Then install the Pelican package using the local pip

pip install Markdown Pelican

and both this environment as well as the next environment used for scientific computing are completely separated, allowing you to avoid any dependency issues (where one tool requires version x.1 of another tool, while the other requires version x.4, for example).

A quick detour. Say you indeed want to use Pelican for your blogs and type your blog posts in markdown. At the same time you would like to provide a pdf download of these posts so that people can archive and reference them easily. This is where Pandoc comes into play. Pandoc is a Haskell-based tool that converts markup languages into other markup languages and text formats (e.g., markdown to pdf/doc/html, etc.). To install Pandoc, get Haskell on your Mac

brew install haskell-platform

and update the Haskell package manager, cabal

cabal update
cabal install pandoc

After that you will need to add the ~/.cabal directory to the PATH environment variable in the bash_profile file.

export PATH=$PATH:~/.cabal

Then source the bash_profile file. Sometimes open bash tabs and active environments will mess up this PATH change, though.

After this quick detour, let's move on to installing the Python stack for scientific computing.

Python in scientific computing

Next, we set up the actual scientific python environment:

mkvirtualenv science

and install the whole stack of relevant python libraries

pip install numpy 
pip install matplotlib 
pip install scipy 
pip install ipython 
pip install pandas 
pip install sympy 
pip install statsModels 
pip install scikit-learn

If any requirements for the abovementioned tools are missing, first try using pip; if that does not succeed, try homebrew. Very often it is just figuring out the actual package name

If you would like to use ipython with a QT-based console, you will need to install SIP and PyQT:

brew install sip
brew install pyqt

In case you prefer the Notebook interface to iPython, also install tornado and pyzmq:

pip install tornado
pip install pyzmq

This setup allows you to get started with python, using ipython either from the terminal or your browser. Installing the fully fledged Spyder GUI, owever, proved difficult. I will change this article once I manage to install it.

In the third part of this series, I will discuss how to install and getting started with MATLAB and OpenCV.

Any comments/questions? Send me an email.

Category: Scientific Computing
Tags: mac science development tutorial