Autotuning OpenCL kernels – CLTune on Windows 7

CLTune is a C++ library for automatically tuning OpenCL kernels to extract the maximum speed from your device. I’m going to try building and using it on Windows 7 with MinGW-w64 (GCC 4.9.1) to see what I can achieve with it. While properly written OpenCL code should work on any conformant device and platform, there’s no guarantee it will be fast. What’s fast on an Nvidia GTX 560 Ti isn’t going to get maximum speed out of an Intel CPU. A kernel that squeezes the maximum throughput out of an Intel CPU when using the Intel OpenCL runtime probably won’t do so well on the AMD CPU runtime. This problem even exists between different versions of Nvidia GPUs – each new compute capability requires different tuning. Continue reading

What’s faster in Numba @jit functions, NumPy or the math package?

Update 2016-01-16: Numba 0.23 released and tested – results added at the end of this post

A while back I was using Numba to accelerate some image processing I was doing and noticed that there was a difference in speed whether I used functions from NumPy or their equivalent from the standard Python math package within the function I was accelerating using Numba. If memory serves, I was using the exp function for something and noticed that replacing numpy.exp with math.exp in the function I had decorated with @jit made a noticeable difference in running time. I didn’t investigate this any further at the time, but now, several versions of Numba and NumPy later, I wanted to find out what was causing this difference and what the current status was in terms of which is faster to use. Continue reading

Analysing data from Stats SA

Statistics South Africa (Stats SA) is the goverment-run statistician in South Africa. They publish a lot of stats about SA, you can find them here: http://www.statssa.gov.za/. I’ve decided to start doing some analyses of the data they make available for the public to download. My first start is writing code to load the data they provide as I’ve chosen to work with the text files they make available. I’ve posted my IPython notebook showing how to load these files here: http://nbviewer.ipython.org/gist/williamjshipman/bb23babe6ffd04a8cb8a

The repository containing this notebook and the data I’ve used are here: https://bitbucket.org/williamjshipman/statssa-blog. I’ll be uploading additional notebooks (and the data they use) to that repo. I hope you find them interesting.

Customising exponents in siunitx

Typesetting SI units in LaTeX can be done using the siunitx package.  One disadvantage is that siunitx doesn’t like math-mode inside the numbers it has to print. This makes it a little tricky to add unusual exponents or do other formatting that is easy in math-mode.  Therefore, siunitx includes a parser that figures out such things as \SI{5 x 2}{\metre} should appear as follows: Continue reading

Testing and profiling Python code simultaneously using pytest and cProfile

I’ve been using pytest for a few months now to help me to test individual functions and algorithms as I develop them.  So far I’ve been impressed by how easy it is to setup my tests.  Reading some of my previous posts would tell you that I like optimising code for speed (perhaps a little too much for my own good).  One particular algorithm was taking several minutes for each test case, so of course I had to fix this. Continue reading

Correctly enabling cl_khr_fp64 in both OpenCL 1.1 and 1.2

I started most of my OpenCL development on Nvidia GPUs, which still only support OpenCL 1.1.  When I started testing code that used double precision arithmetic on AMD Radeon GPUs, I kept running into a warning about the cl_khr_fp64 extension.  The reason for this is, of course, that in OpenCL 1.2 cl_khr_fp64 moved from an optional extension to an optional core feature.  Therefore, the #pragma OPENCL EXTENSION cl_khr_fp64 : enable was no longer needed. Continue reading

Numba nopython mode in versions 0.11 and 0.13 of Numba

My previous posts regarding the Numba package for Python used version 0.11.  Recently, Numba has gone through some major changes in version 0.12.1, 0.12.2 and 0.13.  My last post explained how I had used the nopython keyword argument to speed up my code.  Importantly, it showed that removing array allocation steps (i.e. np.zeros(…)) allowed Numba 0.11 to automatically generate code that did not use the Python C API.  I decided to see what the current state of affairs is with version 0.13.  The release notes for Numba are available here. Continue reading