Update 2014/12/23: I should have pointed out long ago that this post has been superseded by my post “Numba nopython mode in versions 0.11 and 0.13 of Numba“.
Lets say you are trying to accelerate a Python function whose inner loop calls a Numpy function, in my case that function was exp. Using the @autojit decorator from Numba should give you good results. It certainly helped make my function faster, but I felt that more speed was hiding somewhere. This post explores how I got back that speed. Today’s code is available as an IPython notebook here: 2014-02-01-LearningPython-NumbaNopythonContext.ipynb. First, I tested my belief by timing three ways to calculate exp of each entry in a large NumPy array. This is the code for the functions:
import numpy as np from numba import autojit import math def exp_test_1(x): r = np.zeros(x.shape) for row in xrange(x.shape): for col in xrange(x.shape): r[row,col] = np.exp(x[row,col]) return r jit_exp_test_1 = autojit(exp_test_1)
Now lets time this function with and without Numba and compare it to NumPy’s exp using the following code. My test data is a random square array of 1,000 rows and columns.
x = np.random.rand(1000, 1000) %timeit exp_test_1(x) %timeit jit_exp_test_1(x) %timeit np.exp(x)
The results from timeit are as follows:
1 loops, best of 3: 2.34 s per loop 10 loops, best of 3: 165 ms per loop 100 loops, best of 3: 14.9 ms per loop
Using Numba here has resulted in a 14x speedup, but Numpy is still 11x faster than the Numba accelerated function. The results do change if np.exp is replaced with math.exp as follows:
def exp_test_2(x): r = np.zeros(x.shape) for row in xrange(x.shape): for col in xrange(x.shape): r[row,col] = math.exp(x[row,col]) return r jit_exp_test_2 = autojit(exp_test_2)
The new timing code is:
%timeit exp_test_2(x) %timeit jit_exp_test_2(x) %timeit np.exp(x)
The results are:
1 loops, best of 3: 529 ms per loop 10 loops, best of 3: 164 ms per loop 100 loops, best of 3: 15.2 ms per loop
The only interesting result here is that using math.exp instead of NumPy’s exp is faster when not using Numba. Moral of the story there is probably that NumPy is geared towards arrays and vectorised expressions.
Now for the solution to this dilemma. One of the options passed to autojit is nopython. This defaults to False. Setting nopython=True stops Numba from using the Python C API inside the compiled function as far as I can tell. One side effect is that NumPy arrays cannot be created inside the function any more. They must be allocated outside and passed in as parameters. Here’s the code:
def exp_test_3(x,r): #r = np.zeros(x.shape) for row in xrange(x.shape): for col in xrange(x.shape): r[row,col] = np.exp(x[row,col]) jit_exp_test_3 = autojit(exp_test_3, nopython=True) jit_exp_test_3_withpython = autojit(exp_test_3)
Note that instead of creating the “r” array inside the function and returning it as a result using “return r”, I now pass the array as a parameter that gets updated in place. Here’s the timing code and running times:
%timeit exp_test_3(x,np.zeros(x.shape)) %timeit jit_exp_test_3(x,np.zeros(x.shape)) %timeit jit_exp_test_3_withpython(x,np.zeros(x.shape))
1 loops, best of 3: 2.2 s per loop 100 loops, best of 3: 15.3 ms per loop 100 loops, best of 3: 14.7 ms per loop
I’ve achieved my objective, the Numba accelerated function is now just as fast as calling NumPy’s exp function for the whole array. Interestingly, whether I use nopython=True or not, the running times are now similar and both equivalent to np.exp. I don’t know why the changes to the code of the function has somehow made a major improvement. One benefit to using nopython=True is that it forced me to rewrite the function in this faster form. Without it, I would not have made the necessary changes.