I started most of my OpenCL development on Nvidia GPUs, which still only support OpenCL 1.1. When I started testing code that used double precision arithmetic on AMD Radeon GPUs, I kept running into a warning about the cl_khr_fp64 extension. The reason for this is, of course, that in OpenCL 1.2 cl_khr_fp64 moved from an optional extension to an optional core feature. Therefore, the #pragma OPENCL EXTENSION cl_khr_fp64 : enable was no longer needed.
As my code evolved, I inadvertently handled this in too different ways. In some kernels, I used
#if __OPENCL_VERSION__ <= CL_VERSION_1_1 #pragma OPENCL EXTENSION cl_khr_fp64 : enable #endif
While in others I used
#if __OPENCL_C_VERSION__ <= CL_VERSION_1_1 #pragma OPENCL EXTENSION cl_khr_fp64 : enable #endif
TL;DR The first method is the proper one. Why is the first one right though? Both of these worked for me because, on AMD GPUs, I did not use the build option -cl-std (choosing the OpenCL C language version) and on Nvidia GPUs the OpenCL C language version would always be 1.1 or lower. This left me with the question, which of the above methods is right?
The OpenCL 1.2 specification hints at the second method since the changing of cl_khr_fp64 to a core feature is discussed under the OpenCL C changes, not under the platform layer and runtime changes. I decided to test this with some dummy code to see how AMD and Intel OpenCL C compilers handle cl_khr_fp64. Here’s the results on Intel and AMD:
- Intel’s OpenCL C compiler for their CPU runtime just ignores the #pragma, producing no warning or error messages at all, even if the #pragma is missing and -cl-std=CL1.1 is used.
- AMD’s compilers for their CPU and GPU runtimes generate a warning if the #pragma is included. This behaviour occurs when using -cl-std=CL1.1 and -cl-std=CL1.2,
Since both Intel and AMD compilers do not change their behaviour in the presence of -cl-std=CL1.1, I deduce that the OpenCL C version does not affect whether you should enable cl_khr_fp64 when using an OpenCL 1.2 device. In conclusion, the first method above is the right one, you should check __OPENCL_VERSION__ to determine if cl_khr_fp64 needs to be explicitly enabled or not. Looking at some other OpenCL code, such as this kernel from MilkyWay@Home backs up my use of __OPENCL_VERSION__. They also show a more complete way of enabling double precision as they try to enable the cl_amd_fp64 extension if cl_khr_fp64 is not available. I don’t because all the hardware I target supports cl_khr_fp64.
I’ve corrected some typos, and uploaded a copy of the IPython notebook I used to test the compilers. The notebook can be found here.