Learning Python: Numba nopython context for extra speed

Update 2014/12/23: I should have pointed out long ago that this post has been superseded by my post “Numba nopython mode in versions 0.11 and 0.13 of Numba“.

Lets say you are trying to accelerate a Python function whose inner loop calls a Numpy function, in my case that function was exp.  Using the @autojit decorator from Numba should give you good results.  It certainly helped make my function faster, but I felt that more speed was hiding somewhere.  This post explores how I got back that speed.  Today’s code is available as an IPython notebook here: 2014-02-01-LearningPython-NumbaNopythonContext.ipynb.  First, I tested my belief by timing three ways to calculate exp of each entry in a large NumPy array.  This is the code for the functions: Continue reading

Corrections to “Learning Python: Eight ways to filter an image” – fixing my Numba speed problems

My post comparing different ways to implement the bilateral filter, “Learning Python: Eight ways to filter and image”, showed several versions attempting to use Numba.  However, Numba failed to produce the amazing speed ups others have reported.  The fault is my own.  Here is the new version 2 of my Numba code: Continue reading

Learning Python: Eight ways to filter an image

Today’s post is going to look at fast ways to filter an image in Python, with an eye towards speed and memory efficiency.  My previous post put Numba to use, accelerating my code for generating the Burning Ship fractal by about 10x.  This got me thinking about other places where I could use Numba.  That, combined with reading some SciPy and scikit-learn documentation got me onto the topic of filtering an image.  I’m going to focus on 2D gray-scale images for this post.  Oddly enough, there isn’t a bilateral filter implemented in scipy.ndimage so I’m going to tackle that one.  My test image is the famous Lenna Continue reading

Learning Python: Parallel processing with the Parallel Python module, with some Numba added in

Introduction: Parallel Python and the Burning Ship fractal

I have previously used Matlab for a lot of my prototyping work and its parfor Parallel For loop construct has been a relatively easy way to get code to use all the cores available in my desktop.  Now that I am teaching myself Python, I decided to look for something similar.  My first stop is the Parallel Python module a.k.a. PP.  One thing I like about PP is that it can also run on multiple computers. Continue reading

Making PyOpenCL handle NumPy arrays as images

PyOpenCL Image objects take a shape tuple that gives (width, height, depth), but NumPy arrays specify shape in the order (rows, columns, …) a.k.a. (height, width, …) where the ellipsis indicates higher dimensions.  What’s important is that the width and height dimensions have been swapped.  The PyOpenCL documentation suggests creating the NumPy arrays in the Fortran or column-major order, instead of the default row major order.  While that is fine for single channel images, RGB and RGBA images still get messed up.  The solution turned out to be quite easy: swap the first and second dimension.  I assume an OpenCL context (ctx) has already been created and you have an image shape tuple specified as image_shape=(rows, columns, …).  This line creates the OpenCL 2D image and stores it in input_image_cl: Continue reading

My computer can’t add (part 2)

In My computer can’t add (part 1) introduced Kahan summation and showed how it helps to improve the accuracy of summing up a large number of floating point values.  Kahan’s algorithm is fine for this task, but what happens when one tries combining the result of two sums?  What if you need to multiply the compensated sum by some value?  This led to a number of searches on Google and online publishers that turned up relatively little.  What I did eventually find led me to the double-double precision and double-single precision libraries written in Fortran.  Currently, they reside at http://crd-legacy.lbl.gov/~dhbailey/mpdist/.  I am going to focus on the addition operators again, but the Fortran code shows how to handle multiplication, transcendental functions and pretty much everything else you could want. Continue reading

My computer can’t add (part 1)

I recently asked some of my colleagues a question relating to the Kahan compensated summation algorithm and was greeted with blank stares and the question “What’s that?”  This caught me off guard as I had known about it for a few years.  I can’t remember how I found out about it, but I think that journey started with a single tutorial question in 2nd year applied maths and when I read “What every computer scientist should know about floating-point arithmetic” by David Goldberg.  I’ll start this off with why this algorithm matters, then what it actually is. Continue reading