Category Archives: OpenCL

Autotuning OpenCL kernels – CLTune on Windows 7

CLTune is a C++ library for automatically tuning OpenCL kernels to extract the maximum speed from your device. I’m going to try building and using it on Windows 7 with MinGW-w64 (GCC 4.9.1) to see what I can achieve with it. While properly written OpenCL code should work on any conformant device and platform, there’s no guarantee it will be fast. What’s fast on an Nvidia GTX 560 Ti isn’t going to get maximum speed out of an Intel CPU. A kernel that squeezes the maximum throughput out of an Intel CPU when using the Intel OpenCL runtime probably won’t do so well on the AMD CPU runtime. This problem even exists between different versions of Nvidia GPUs – each new compute capability requires different tuning. Continue reading


Correctly enabling cl_khr_fp64 in both OpenCL 1.1 and 1.2

I started most of my OpenCL development on Nvidia GPUs, which still only support OpenCL 1.1.  When I started testing code that used double precision arithmetic on AMD Radeon GPUs, I kept running into a warning about the cl_khr_fp64 extension.  The reason for this is, of course, that in OpenCL 1.2 cl_khr_fp64 moved from an optional extension to an optional core feature.  Therefore, the #pragma OPENCL EXTENSION cl_khr_fp64 : enable was no longer needed. Continue reading