Tag Archives: OpenCL

Autotuning OpenCL kernels – CLTune on Windows 7

CLTune is a C++ library for automatically tuning OpenCL kernels to extract the maximum speed from your device. I’m going to try building and using it on Windows 7 with MinGW-w64 (GCC 4.9.1) to see what I can achieve with it. While properly written OpenCL code should work on any conformant device and platform, there’s no guarantee it will be fast. What’s fast on an Nvidia GTX 560 Ti isn’t going to get maximum speed out of an Intel CPU. A kernel that squeezes the maximum throughput out of an Intel CPU when using the Intel OpenCL runtime probably won’t do so well on the AMD CPU runtime. This problem even exists between different versions of Nvidia GPUs – each new compute capability requires different tuning. Continue reading

Correctly enabling cl_khr_fp64 in both OpenCL 1.1 and 1.2

I started most of my OpenCL development on Nvidia GPUs, which still only support OpenCL 1.1.  When I started testing code that used double precision arithmetic on AMD Radeon GPUs, I kept running into a warning about the cl_khr_fp64 extension.  The reason for this is, of course, that in OpenCL 1.2 cl_khr_fp64 moved from an optional extension to an optional core feature.  Therefore, the #pragma OPENCL EXTENSION cl_khr_fp64 : enable was no longer needed. Continue reading

Making PyOpenCL handle NumPy arrays as images

PyOpenCL Image objects take a shape tuple that gives (width, height, depth), but NumPy arrays specify shape in the order (rows, columns, …) a.k.a. (height, width, …) where the ellipsis indicates higher dimensions.  What’s important is that the width and height dimensions have been swapped.  The PyOpenCL documentation suggests creating the NumPy arrays in the Fortran or column-major order, instead of the default row major order.  While that is fine for single channel images, RGB and RGBA images still get messed up.  The solution turned out to be quite easy: swap the first and second dimension.  I assume an OpenCL context (ctx) has already been created and you have an image shape tuple specified as image_shape=(rows, columns, …).  This line creates the OpenCL 2D image and stores it in input_image_cl: Continue reading