CLTune is a C++ library for automatically tuning OpenCL kernels to extract the maximum speed from your device. I’m going to try building and using it on Windows 7 with MinGW-w64 (GCC 4.9.1) to see what I can achieve with it. While properly written OpenCL code should work on any conformant device and platform, there’s no guarantee it will be fast. What’s fast on an Nvidia GTX 560 Ti isn’t going to get maximum speed out of an Intel CPU. A kernel that squeezes the maximum throughput out of an Intel CPU when using the Intel OpenCL runtime probably won’t do so well on the AMD CPU runtime. This problem even exists between different versions of Nvidia GPUs – each new compute capability requires different tuning. Continue reading

# Monthly Archives: January 2016

# What’s faster in Numba @jit functions, NumPy or the math package?

**Update 2016-01-16: Numba 0.23 released and tested – results added at the end of this post**

A while back I was using Numba to accelerate some image processing I was doing and noticed that there was a difference in speed whether I used functions from NumPy or their equivalent from the standard Python math package within the function I was accelerating using Numba. If memory serves, I was using the exp function for something and noticed that replacing numpy.exp with math.exp in the function I had decorated with @jit made a noticeable difference in running time. I didn’t investigate this any further at the time, but now, several versions of Numba and NumPy later, I wanted to find out what was causing this difference and what the current status was in terms of which is faster to use. Continue reading