**Update 2016-01-16: Numba 0.23 released and tested – results added at the end of this post**

A while back I was using Numba to accelerate some image processing I was doing and noticed that there was a difference in speed whether I used functions from NumPy or their equivalent from the standard Python math package within the function I was accelerating using Numba. If memory serves, I was using the exp function for something and noticed that replacing numpy.exp with math.exp in the function I had decorated with @jit made a noticeable difference in running time. I didn’t investigate this any further at the time, but now, several versions of Numba and NumPy later, I wanted to find out what was causing this difference and what the current status was in terms of which is faster to use.

My test examines 26 functions from the math package that all take a single input and produce a single output and which have an analogous function in NumPy. Rather than writing 26 functions, each following the same template of looping through an array calling the function on each element of that array, I decided to make use of Python’s compile and exec functions. First, I defined a template string that defined what my function was going to look like:

code = ''' import {module:s} from numba import jit def f(x): sum = 0.0 for idx in range(x.size): sum += {module:s}.{function:s}(x[idx]) return sum jit_f = jit(f) array_fn = {module:s}.{function:s} '''

The problem with using a template is that all the function names need to be identical between NumPy and the standard math package, otherwise an exception will be thrown. I got around that by defining the following look-up function that maps function names in the math package to function names in NumPy:

def map_to_numpy_name(fnname): npnames = {'acos':'arccos', 'asin':'arcsin', 'atan':'arctan', 'acosh':'arccosh', 'asinh':'arcsinh', 'atanh':'arctanh'} if fnname in npnames: return npnames[fnname] else: return fnname

The module place-holder gets replaced with either ‘numpy’ or ‘math’, while the function place-holder gets replaced with one of the tested function names. This code, when executed, defines three functions: ‘f’, ‘jit_f’ and ‘array_fn’. ‘f’ is just the plain Python loop, so it will be slowest version. ‘jit_f’ is the compiled function courtesy of Numba. ‘array_fn’ is defined so that I have an easy way of calling the tested function from my timing code when the function operates on arrays. That gets used to compare NumPy array operations to looping through the array manually. This is how to compile the code and execute it, passing it its own namespace in which to keep its variables and functions:

compiled_code = compile(code.format(module=module, function=fnname), '&amp;amp;amp;amp;lt;string&amp;amp;amp;amp;gt;', 'exec') namespace = {} exec(compiled_code, namespace)

Once that’s done, I can time one of the three functions using the following code:

def time_individual_function(type_name, function_ptr, num_repeats, num_loops, verbose, test_data): durations = np.zeros((num_repeats,)) for repetition in range(num_repeats): tstart = clock() for loop in range(num_loops): function_ptr(test_data) tend = clock() durations[repetition] = (tend - tstart) / num_loops return durations running_times[('numba', fnname, module)] = time_individual_function('Numba JITted', namespace['jit_f'], num_repeats, num_loops, verbose, test_data)

The full code is available online here with some additional debugging messages and a CLI interface. Lets get to some results instead:

Function | NumPy (Python) | Math (Python) | NumPy (Numba) | Math (Numba) | NumPy array operation |

ceil | 1.23475341088 | 0.217901148218 | 0.0083829793621 | 0.000920285178236 | 0.00982844277674 |

fabs | 1.31707992495 | 0.224147452158 | 0.00166955347092 | 0.00087252532833 | 0.00334940337711 |

floor | 1.22943912946 | 0.218479819887 | 0.00824501313321 | 0.000875317073172 | 0.00973097185741 |

isinf | 3.13314332458 | 0.193701463415 | 0.00193104690431 | 0.00157070168856 | 0.00253241275797 |

isnan | 3.16061943715 | 0.193019467167 | 0.000899212007505 | 0.000886003752345 | 0.000497831144463 |

trunc | 1.23497512946 | 0.401735564728 | 0.0084893358349 | 0.00117097185741 | 0.00998174859287 |

exp | 1.3437880075 | 0.22469382364 | 0.0156664315197 | 0.0156368930582 | 0.0173403377111 |

expm1 | 1.41952489306 | 0.229173613508 | 0.0332130281426 | 0.0177992945591 | 0.0345634521576 |

log | 1.35790436023 | 0.259746851782 | 0.00917202251408 | 0.00911096435271 | 0.0103853208255 |

log1p | 1.41073257786 | 0.221454078799 | 0.015283902439 | 0.0115659587242 | 0.0163751144465 |

log10 | 1.38020214634 | 0.222261703565 | 0.0093406979362 | 0.00929446904315 | 0.0105526754221 |

sqrt | 1.29980181614 | 0.215333043152 | 0.0056800900563 | 0.00554746716698 | 0.00419920450281 |

acos | 1.38762875797 | 0.230810866792 | 0.0171560825516 | 0.0171594446529 | 0.0178593921201 |

asin | 1.38347082927 | 0.235335714822 | 0.0182445928705 | 0.0185464615385 | 0.0191342589118 |

atan | 1.38371124953 | 0.22688558349 | 0.01454084803 | 0.0145301013133 | 0.0154805853659 |

cos | 1.35799483677 | 0.232051091932 | 0.0200636998124 | 0.0204364727955 | 0.0214075797373 |

sin | 1.40375426642 | 0.235093163227 | 0.0213015534709 | 0.0211516998124 | 0.0225604502814 |

tan | 1.41166892308 | 0.240718168856 | 0.0274034971857 | 0.0276429268293 | 0.0287924953096 |

degrees | 1.31739245028 | 0.182552345216 | 0.00166868292683 | 0.00608276172609 | 0.00329684052533 |

radians | 1.31977080675 | 0.18133163227 | 0.00166409005628 | 0.00607432645402 | 0.00330575609755 |

acosh | 1.38693283302 | 0.235513666041 | 0.0254836472795 | 0.0299879324578 | 0.0268675722326 |

asinh | 1.43626140338 | 0.249595106942 | 0.0391198198874 | 0.0405993245779 | 0.040370521576 |

atanh | 1.3980028818 | 0.248623459662 | 0.0226747917448 | 0.0363789868668 | 0.0238825966229 |

cosh | 1.35819866417 | 0.230515302064 | 0.0213204652908 | 0.0212887954972 | 0.0228190318949 |

sinh | 1.36441389869 | 0.23058641651 | 0.0210535384615 | 0.0210634446529 | 0.0223715722326 |

tanh | 1.26429484428 | 0.220724592871 | 0.00989607504691 | 0.00988806003752 | 0.0114026866792 |

That long table wasn’t much fun to look at, here’s a bar plot instead showing the speedup achieved by replacing each NumPy function with its equivalent from the math package

You can see that sometimes, replacing the NumPy function with an equivalent from the math package makes a huge improvement (7x~9x faster) and that most of the time it is as fast as NumPy. The exceptions are the degrees, radians and atanh functions that are all faster in NumPy.

Taking a look at the assembler code generated or each of these tests shows that the math.ceil, math.floor and math.fabs functions get inlined, while functions such as acos, exp and log remain as function calls. In the case of math.* functions, the LLVM IR generated by Numba uses LLVM intrinsics, which the compiler target can then turn into function calls or replace them with inline code. The NumPy functions never get inlined, they remain as calls to numba.npymath.* functions. Thank you to Stanley Seibert for explaining this to me on the Numba mailing list over here.

The degrees and radians functions are an odd anomaly. The math.degrees and math.radians versions get inlined as explained above. However, the inlined versions are much slower than just calling numpy.radians. Since the conversion is just a few multiplications, this definitely wasn’t expected.

All of this has been reported on the Numba GitHub repository in the following two issues:

- Speed difference between using math.* and numpy.* functions inside a jitted function
- Using math.degrees much slower than numpy.degrees inside jitted function

The Numba team has been really quick – there’s already progress on fixing the speed difference. When Numba 0.23 comes out, I’ll update this post.

My final take-home point with this is: its worth it to test whether frequently used functions (NumPy, math, any other package) can be replaced with code that can be compiled and hence inlined courtesy of Numba. Perhaps you won’t see any benefit, but you could see quite a speedup. Importantly though, check that the substitution doesn’t affect the correctness of your algorithm. As I found when updating this post for Numba 0.23, there can be differences between the interface definitions of the math.* and numpy.* functions (trunc for example) which could actually give you different results, so be careful.

## Update 2016-01-16: Numba 0.23

Numba 0.23 has just come out and as promised I’ve re-run this test. The updated results are available here. Here’s the revised plot of speed differences between using math.* and numpy.* functions inside a Numba jitted function.

Immediately you can see a lot of the big speed differences have gone. Some remain with the transcendental functions, but the Numba developers will look into those cases. One speed difference that isn’t going away is that now numpy.trunc is faster than math.trunc inside jitted functions. Both functions get inlined now, resulting in the roundsd assembler instruction being used. The reason for the difference is actually that numpy.trunc returns a float while math.trunc returns an integer type. Numba is respecting this behaviour, hence it introduces additional assembler code for a type cast if you use math.trunc. This is also why the Numba developers have to be a little careful, sometimes the NumPy definition of a function and the math package definition will differ and Numba has to respect those differences.