Setting up a Mac for Scientific Computing and Software Development II
Part i covered the preparations for installing software development and scientific computing tools: Editors, the Terminal, Mac OS X Package Managers and some bash basics. In part ii we will dive into setting up a Python environment and all tools necessary for blogging and scientific computing. Part iii will decribe MATLAB, OpenCV and mexopencv.
Python
If you consider using Python and/or MATLAB on a Mac for Scientific Computing (or Data Analysis and Visualization, Statistics etc.), then you have two alternative options - using a Python distribution or installing it all manually. A notable example for a good Python distribution is the Enthought Python Distribution, which comes in both free and non-free versions. Another one of interest is Sage, which combines a number of different open source tools based on many different languages into a common Python-based interface. Last, Continuum Analytics, a company contributing a lot of open source tools to the Python stack, offers the Anaconda distribution, which I have not tested yet.
The second option is to build everything from scratch, which has the advantage that control and organization of the code is much easier to accomplish, especially if you want to use scientific computing packages also in other environments such as website development. A Python stack useful for scientific computing typically consists of a number of packages that require each other. For example, scipy, a package containing functions for interpolation, optimization and statistics, among many others, depends on numpy and matplotlib for plotting and also requires a Fortran compiler to be installed on the machine. A starting stack of tools could contain
- numpy, which provides arrays and other features commonly needed in scientific computing
- matplotlib, a 2D plotting library
- scipy, a library containing plenty of numerical solvers
- ipython, an interactive shell and browser-based notebook for Read-Evaluate-Print Loops (REPL)
- pandas, a library providing a data type called DataFrame similar to R's Data Frames, among other things
- statsModels, a library that provides statistical computing functionality
- scikit-learn, python's machine learning library
- sympy a library for symbolic math in python
- Spyder a GUI providing functionalities similar to the MATLAB Desktop
Installing this stack of libraries requires some prior work, however. The Mac OS X version of Python is an old one and not easy or straightforward to update. We can use the previously installed Homebrew to get a separate and clean Python installation. When installing Python via Homebrew, I had some issues with installing other Python packages afterwards via pip, as the OpenSSL version coming with Mac OS X 10.8.3 was really old or rather obsolete. It is good practice to install Python with the openSSL coming with Homebrew (see here for more info)
brew install python --with-brewed-openssl
which installs the latest Python on the 2.x branch. You can do the same to install Python 3.x:
brew install python3 --with-brewed-openssl
The advantage of using Homebrew is that you always get the latest Python versions and that you can use the pip included in Homebrew to install other packages (such as the great virtualenv and virtualenvwrapper tools):
pip install virtualenv
pip install virtualenvwrapper
So what are these two packages for?
Python tools for static blog generation
They modify your system so that you can use different Python package stacks simultaneously. Say you want to use scipy and the associated tools for your scientific work. scipy requires Python 2.x. At the same time you maybe would like to use Pelican as a static blog generator (highly recommended, this blog is running on pelican). Pelican does work with Python 3.x. To separate these two environments, create separate virtual environments with their own local Python and associated package installations. For example, I have set up a Pelican environment using
mkproject Pelican
Please read the virtualenvwrapper documentation to understand its API. Your prompt will change accordingly and will put the name of the active environment in front of it. Then install the Pelican package using the local pip
pip install Markdown Pelican
and both this environment as well as the next environment used for scientific computing are completely separated, allowing you to avoid any dependency issues (where one tool requires version x.1 of another tool, while the other requires version x.4, for example).
A quick detour. Say you indeed want to use Pelican for your blogs and type your blog posts in markdown. At the same time you would like to provide a pdf download of these posts so that people can archive and reference them easily. This is where Pandoc comes into play. Pandoc is a Haskell-based tool that converts markup languages into other markup languages and text formats (e.g., markdown to pdf/doc/html, etc.). To install Pandoc, get Haskell on your Mac
brew install haskell-platform
and update the Haskell package manager, cabal
cabal update
cabal install pandoc
After that you will need to add the ~/.cabal directory to the PATH
environment variable in the bash_profile
file.
export PATH=$PATH:~/.cabal
Then source
the bash_profile
file. Sometimes open bash tabs and active environments will mess up this PATH
change, though.
After this quick detour, let's move on to installing the Python stack for scientific computing.
Python in scientific computing
Next, we set up the actual scientific python environment:
mkvirtualenv science
and install the whole stack of relevant python libraries
pip install numpy
pip install matplotlib
pip install scipy
pip install ipython
pip install pandas
pip install sympy
pip install statsModels
pip install scikit-learn
If any requirements for the abovementioned tools are missing, first try using pip; if that does not succeed, try homebrew. Very often it is just figuring out the actual package name
If you would like to use ipython with a QT-based console, you will need to install SIP and PyQT:
brew install sip
brew install pyqt
In case you prefer the Notebook interface to iPython, also install tornado and pyzmq:
pip install tornado
pip install pyzmq
This setup allows you to get started with python, using ipython either from the terminal or your browser. Installing the fully fledged Spyder GUI, owever, proved difficult. I will change this article once I manage to install it.
In the third part of this series, I will discuss how to install and getting started with MATLAB and OpenCV.
Any comments/questions? Send me an email.