Today’s post is going to look at fast ways to filter an image in Python, with an eye towards speed and memory efficiency. My previous post put Numba to use, accelerating my code for generating the Burning Ship fractal by about 10x. This got me thinking about other places where I could use Numba. That, combined with reading some SciPy and scikit-learn documentation got me onto the topic of filtering an image. I’m going to focus on 2D gray-scale images for this post. Oddly enough, there isn’t a bilateral filter implemented in scipy.ndimage so I’m going to tackle that one. My test image is the famous Lenna

All my tests are conducted after converting Lenna to gray-scale.

There are three ways I could implement this filter in pure Python. Obviously, I could write out the double for loops myself for iterating over each pixel, that’s going be indicated by the label ‘naive_2D_filter’ in my code and results. There’s also scipy.ndimage.filters.generic_filter, which I’ll label ‘scipy_gen_filter’. Finally, there is sklearn.feature_extraction.image.extract_patches_2d from scikit-learn, which I’ll label ‘sklearn_2d_filter’. This method converts the image into a larger array, creating a copy of the window about each pixel in the image, except for border pixels. Since this method produces such a large intermediate data structure, I’m only including it to see if the other approaches are much slower or faster. For those familiar with Matlab, this is very similar to im2col (except that this is all free). That’s three ways to filter the image, another three ways come from incorporating Numba into each of these three methods. The next method puts OpenCV to the test. I haven’t mixed OpenCV and Numba as there’s no point. Finally, I tried scikit-image as it has a bilateral filter implementation too. After blurring using a 5×5 window and spatial and range sigmas of 20.0, Lenna looks like this:

Please note that the code snippets I’m showing here are extracts from the complete code on Bitbucket. I have encapsulated all these functions in two classes, hence the “self” parameters passed to each function. The main reason for doing this is that the spatial blur weights are calculated once in the __init__ method of the class and re-used in each subsequent method for calculating the blur.

# @autojit vs. @jit in Numba

My previous post used @autojit quite extensively, with a caveat I hadn’t realised. A function using the @autojit decorator is only compiled on the first time it is called. Therefore, my benchmarks should have included a warm-up call to make sure the functions had been compiled before measuring running time. From my experience with this code, the overhead seems to be less than 0.5 seconds.

Instead of using this warm-up approach for today’s code, I decided to experiment with @jit and some of the other decorators. This approach dictates what types to use for each function argument and the return type, so the function can be compiled when the code loads, rather than doing it on the first call. This taught me two interesting things.

- You have to be very careful if trying to mix float32 and float64 code. float64 is the default in Python, float32 is only available via NumPy. Numba seems smart enough to work around this, but the type-casts slow down the code more than any speed-up gained from using float32. Also, floating point constants are always 64-bit in Python, so more type-casts can appear where constants are used.
- If using @jit on a method in a class, the first parameter must be numba.object_, otherwise the error “numba.error.NumbaError: Incorrect number of types specified in @jit() for function ‘<your function name here>'” gets produced.

# Naive 2D filtering

The basic idea is to have a double for loop to iterate over all the pixels, with an inner double for loop at each pixel to iterate over the pixel neighbourhood. Here’s the code:

def naive_2D_filter_v1(self, img): output = np.zeros(img.shape, dtype=img.dtype) for row in xrange(img.shape[0]): for col in xrange(img.shape[1]): centre = img[row, col] idx = 0 total_weight = 0.0 sum_neighbours = 0.0 for r in xrange(row-2, row+3): for c in xrange(col-2, col+3): if ((r >= 0) & (r < img.shape[0]) & (c >= 0) & (c < img.shape[1])): pixel = img[r,c] else: pixel = 0.0 dr = centre - pixel filter_weight = range_coefficient * np.exp(-1.0 * dr * dr / range_divisor) * self.spatial_gaussian_weights[idx] sum_neighbours += (filter_weight * pixel) total_weight += filter_weight idx += 1 output[row, col] = sum_neighbours / total_weight return output

This is the slowest version, taking 35.997 seconds using a 5×5 window. Next, the Numba version. Unfortunately, just adding @autojit or @jit to the above code produces the “llvm.LLVMException: Instruction does not dominate all uses!” when the function gets built. The problem is related to total_weight and sum_neighbours as initialising them with numpy.float64(0.0) instead of just 0.0 seems to fix the problem. I still need to look into this problem.

@numba.jit(numba.float64[:,:](numba.object_, numba.float64[:,:])) def naive_2D_filter_v1(self, img): output = np.zeros(img.shape, dtype=img.dtype) scalar_zero = np.float64(0.0) for row in xrange(img.shape[0]): for col in xrange(img.shape[1]): centre = img[row, col] idx = 0 total_weight = scalar_zero sum_neighbours = scalar_zero for r in xrange(row-2, row+3): for c in xrange(col-2, col+3): if ((r >= 0) & (r < img.shape[0]) & (c >= 0) & (c < img.shape[1])): pixel = img[r,c] else: pixel = 0.0 dr = centre - pixel filter_weight = range_coefficient * np.exp(-1.0 * dr * dr / range_divisor) * self.spatial_gaussian_weights[idx] sum_neighbours += (filter_weight * pixel) total_weight += filter_weight idx += 1 output[row, col] = sum_neighbours / total_weight return output

This takes 14.281 seconds, a little over two times faster. I didn’t stop here though, I decided to split this function into two by placing the per-pixel operation into a separate method as follows:

@numba.double(numba.double[:,:], numba.int_, numba.int_) def naive_2D_filter_v2_kernel(self, img, row, col): centre = img[row, col] idx = 0 total_weight = 0.0 sum_neighbours = 0.0 for r in xrange(row-2, row+3): for c in xrange(col-2, col+3): if ((r >= 0) & (r < img.shape[0]) & (c >= 0) & (c < img.shape[1])): pixel = img[r,c] else: pixel = 0.0 dr = centre - pixel filter_weight = range_coefficient * np.exp(-1.0 * dr * dr / range_divisor) * self.spatial_gaussian_weights[idx] sum_neighbours += (filter_weight * pixel) total_weight += filter_weight idx += 1 return sum_neighbours / total_weight @numba.jit(numba.float64[:,:](numba.object_, numba.float64[:,:])) def naive_2D_filter_v2(self, img): output = np.zeros(img.shape, dtype=np.float64) for row in xrange(img.shape[0]): for col in xrange(img.shape[1]): output[row, col] = self.naive_2D_filter_v2_kernel(img, row, col) return output

Firstly, note that this version no longer needs to use numpy.float64(0.0) to initialise total_weight and sum_neighbours. Now Numba is happy to work with 0.0 again. These two little changes bring the running time to 9.951 seconds.

# Using SciPy ndimage

This method was actually the second easiest to write. First I need to import the filters.

import scipy.ndimage.filters as ndfilt

The function I’m interested in is ndfilt.generic_filter, which takes a function that takes a 1D array as input and returns a scalar. The 1D array is the flattened version of the N-dimensional window (2D in this example) around each pixel. First lets define this kernel:

def bilateral_kernel(self, input): centre = input[12] range = input - centre range_contrib = range_coefficient * np.exp(-1.0 * range * range / range_divisor) total_weights = range_contrib * self.spatial_gaussian_weights total_weights /= total_weights.sum() return np.sum(input * total_weights)

Using the kernel is extremely simple:

def scipy_gen_filter(self, img): output = np.zeros(img.shape, dtype=img.dtype) ndfilt.generic_filter(img, self.bilateral_kernel, [5, 5], output=output, mode='constant', cval=0.0) return output

Without using Numba, this method is able to apply the 5×5 bilateral filter in just 7.584 seconds, making it substantially faster than version 1 or 2 of the naive implementation using Numba.

Adding @numba.double(numba.double[:]) before of bilateral_kernel and @numba.jit(numba.float64[:,:](numba.object_, numba.float64[:,:])) before scipy_gen_filter decreases the running time to 4.690 seconds. Note that numba.float64 and numba.double are the same thing.

# Revisiting the naive methods

After writing the kernel used in the above section, I decided to see if I could rewrite my naive versions to use it and to see what speed I’d get. First up, an overly complex version.

def naive_2D_filter_v3(self, img): output = np.zeros(img.shape, dtype=img.dtype) for row in xrange(img.shape[0]): for col in xrange(img.shape[1]): idx = np.array([[row+r for r in xrange(-2,3) for t in xrange(-2,3)], [col+c for t in xrange(-2,3) for c in xrange(-2,3)]]) idx_valid = (idx[0,:] >= 0) & (idx[0,:] < img.shape[0]) & (idx[1,:] >= 0) & (idx[1,:] < img.shape[1]) idx = idx[:,idx_valid] window = np.zeros((25,)) window[idx_valid] = img[idx[0,:], idx[1,:]] output[row, col] = self.bilateral_kernel(window) return output

Line 5 enumerates all the 2D coordinates within the window about the current pixel, while line 6 creates a mask for those coordinates that actually lie inside the image. Pixels outside the image are assumed to be 0.0. Once the 5×5 window has been copied into a temporary buffer, the bilateral filter kernel gets called. This mess of code takes 21.341 seconds.

Unfortunately, Numba did not work with this function as it didn’t like my shenanigans on lines 5 and 6. After trying many different approaches that still kept the basic scheme alive of having coordinate indices and boolean indices, I was forced to abandon this version as Numba just would not accept it. Version 4 below goes back to having four nested loops, but this time the inner double for loops are responsible for copying the appropriate window from the image into a temporary array. Thereafter, the filter kernel from the previous section is called. Here is the code.

def naive_2D_filter_v4(self, img): output = np.zeros(img.shape, dtype=img.dtype) for row in xrange(img.shape[0]): for col in xrange(img.shape[1]): idx = 0 window = np.zeros((25,)) for r in xrange(row-2, row+3): for c in xrange(col-2, col+3): if (r >= 0) & (r < img.shape[0]) & (c >= 0) & (c < img.shape[1]): window[idx] = img[r,c] else: window[idx] = 0.0 idx += 1 output[row, col] = self.bilateral_kernel(window) return output

Quite surprisingly, this method only takes 13.568 seconds, making it faster than using Numba to accelerate naive_2D_filter_v1. My guess is that using vectorisation (i.e. NumPy array operations) in bilateral_kernel gives this speed-up. Now lets see what Numba can achieve with version 4. Here’s the code:

@numba.jit(numba.float64[:,:](numba.object_, numba.float64[:,:])) def naive_2D_filter_v4(self, img): output = np.zeros(img.shape, dtype=np.float64) for row in xrange(img.shape[0]): for col in xrange(img.shape[1]): idx = 0 window = np.zeros((25,), dtype=np.float64) for r in xrange(row-2, row+3): for c in xrange(col-2, col+3): if (r >= 0) & (r < img.shape[0]) & (c >= 0) & (c < img.shape[1]): window[idx] = img[r,c] else: window[idx] = 0.0 idx += 1 output[row, col] = self.bilateral_kernel(window) return output

This time, the running time is just 5.175 seconds, making it nearly as fast as scipy_gen_filter with Numba. This is very encouraging for times when scipy.ndimage.filters.generic_filter may not be applicable, e.g. some sort of reduction-type scenario.

# Using scikit-learn

This was the third easiest method to filter the image. Unfortunately, extract_patches_2d only works for 2D images, while everything else here can work for N-D images. extract_patches_2d creates a new array with dimensions N x h x w x c, where w and h are the width and height (a.k.a. columns and rows) of the filter window. N is the number of pixels in the image, minus edge pixels for which a proper window cannot be constructed. This differs from the earlier methods in that everything else in this blog post assumes pixels outside of the image equal 0.0. If working with colour images then c is the number of colour channels. Right, onto some code.

def sklearn_2d_filter(self, img): all_windows = sklrnimg.extract_patches_2d(img, (5, 5)) output_patches = np.zeros((all_windows.shape[0], 1, 1)) sum_of_weights = np.zeros((all_windows.shape[0], 1, 1)) idx = 0 for r in xrange(5): for c in xrange(5): dr = all_windows[:,3,3] - all_windows[:,r,c] weight = (range_coefficient * np.exp(-1.0 * dr * dr / range_divisor) * self.spatial_gaussian_weights[idx]).reshape(output_patches.shape) idx+=1 sum_of_weights += weight output_patches += all_windows[:,r,c].reshape(weight.shape) * weight output_patches /= sum_of_weights output = sklrnimg.reconstruct_from_patches_2d(output_patches, (img.shape[0] - 4, img.shape[1] - 4)) return output

This function takes just 2.232 seconds to run. Accelerating it with Numba doesn’t help much but I did it anyway. I added @numba.jit(numba.float64[:,:](numba.object_, numba.float64[:,:])). This reduced the running time to 2.264 seconds, such a small change that I’m going to disregard it. What makes this way of filtering the image so fast is that I can now use NumPy efficiently.

# OpenCV in Python

This one was a bit of a late-comer to the tests, admittedly I forgot about OpenCV’s Python bindings. OpenCV has a bilateral filter function cv2.bilateralFilter. Its parameters are the input image, filter width, range or colour sigma and finally the spatial sigma. The code is quite simple:

import cv2 def opencv_test(test_image): return cv2.bilateralFilter(np.float32(test_image), 5, 20.0, 20.0)

The only complication is that this function only works on 8-bit images or else 32-bit float images, hence the np.float32 cast. Otherwise, this is by far the easiest method to use. The running time is absolutely amazing, just 0.006 seconds. One thing to note is that the first time I ran the program the running time was something like 0.17 seconds for this test. My guess is that this is the overhead incurred in loading the OpenCV DLL.

# Scikit-image

Using scikit-image is just as easy as OpenCV, just import and call. Here’s the code:

from skimage.filter import denoise_bilateral as skimg_bilateral def skimg_test(test_image): return skimg_bilateral(test_image, 5, 20.0, 20.0, mode='constant', cval=0.0)

This takes 0.519 seconds to run. Sorry scikit-image, OpenCV wins here.

# Conclusion

Lets start with a summary of the running times:

- naive_2D_filter_v1: 35.997000 seconds.
- naive_2D_filter_v1 (Numba): 14.281000 seconds.
- naive_2D_filter_v2 (Numba): 9.951000 seconds.
- naive_2D_filter_v3: 21.341000 seconds.
- naive_2D_filter_v4: 13.568000 seconds.
- naive_2D_filter_v4 (Numba): 5.175000 seconds.
- scipy_gen_filter: 7.584000 seconds.
- scipy_gen_filter (Numba): 4.690000 seconds.
- sklearn_2d_filter: 2.232000 seconds.
- sklearn_2d_filter (Numba): 2.264000 seconds.
- OpenCV: 0.006000 seconds.
- scikit-image: 0.519000 seconds.

The other important factor is memory consumption. Unfortunately, I don’t have a good way to measure that except staring at Windows Task Manager while the code runs. These results list approximately how much more RAM gets used in excess of the 512*512*8 bytes or 2MB for the output image. I commented out all other tests so there could be re-use of memory between tests. Please remember that these figures are very approximate

- naive_2D_filter_v1: 0MB.
- naive_2D_filter_v1 (Numba): 0MB.
- naive_2D_filter_v2 (Numba): 0MB.
- naive_2D_filter_v3: 0MB.
- naive_2D_filter_v4: 0MB.
- naive_2D_filter_v4 (Numba): 0MB.
- scipy_gen_filter: 0MB.
- scipy_gen_filter (Numba): 0MB.
- sklearn_2d_filter: 135MB.
- sklearn_2d_filter (Numba): 153MB.

The OpenCV and scikit-image tests are so fast that I can’t measure the memory consumption.

These memory usage figures are not that surprising. The ‘naive’ methods don’t allocate any intermediate arrays. Apparently, SciPy’s ndimage.filters.generic_filter doesn’t use intermediate arrays either, making it a really compelling choice. In contrast, extract_patches_2d from scikit-learn bloats out the original image into something 25 times larger and probably also causes NumPy to generate several intermediate arrays. I hadn’t expected it to be quite so bad using over 100MB just to process a 2MB image though.

There you have it, OpenCV beats everything by a very long way for bilateral filtering. SciPy’s ndimage.filters.generic_filter is also encouraging as it is a memory efficient way to process N-dimensional images, something OpenCV, scikit-learn and scikit-image cannot do. Now, if only I could close the gap between OpenCV and SciPy. I am disappointed in Numba since my fastest naive version only saw a 3x speed up by using Numba and my Numba accelerated kernel for SciPy produced only a small improvement.

Pingback: Corrections to “Learning Python: Eight ways to filter an image” – fixing my Numba speed problems | William J Shipman

anonyI’ve been searching all over for how to use @jit with a class… I have a class called GBC and a method within it called “train”, it takes in as input a 2D float array, a 1D int array, and an int variable, and it returns nothing.

I’m doing this:

class GBC(object):

# init method and other stuff here

# this is the method I want optimized:

@jit (void(object_, float_[:,:],int_[:],int_))

def train(self, X, y, H):

Is this correct? I get an error talking about an unexpected indent on the line saying “def train(self,X,y,H)”. There are no indentation problems, I’ve re-written and indented my entire code.

LikeLike

William ShipmanPost authorHi anony. After copying your code into an IPython console and correctly indenting it, I don’t get the error you describe (I do get others, but they relate to Numba). The only time I had trouble with indentation was when I had mixed tabs and spaces in the same .py file. If your editor has an option to display tabs and spaces then use this and make sure that you are consistent throughout your code in using either tabs or spaces, but not both, to indent your code.

Regarding your use of Numba, what version are you using? If you are using 0.13.x or 0.12 then you can’t properly use @jit on classes and methods in classes. There was a major refactoring of the Numba code between versions 0.11 and 0.12 that broke some stuff unfortunately.

LikeLike

m04Hi William

This is the only source i could find on jit for methods. However I could not find object_ in the numba package any more, I assume there’s another syntax now, can you elaborate?

jitclass doesn’t do the trick for me as it doesn’t support a lot of things I can’t do without and I only want to jit a couple of methods which it seems like I need to specify the exact types for.

Thank you!

LikeLike

William ShipmanPost authorHi m04. I did a quick, very simple, test class. Using @jit on a single method of a class, without any signature being specified, works correctly. The object_ syntax has been removed and there doesn’t seem to be a replacement. Also, it isn’t supported in nopython mode, so @njit generates an error message. If you need nopython mode, you’ll have to extract the compute intensive parts of your algorithm into functions outside of your class. First benchmark using @jit as Numba does have logic to compile parts of functions in nopython mode, even if the whole function can’t be.

LikeLike