Accelerated Image Processing: NVPP Performance Benchmarks

Friday, 26 March 2010

NVPP Performance Benchmarks

In my last post, I cast some doubt on the performance and utility of GPU's for small image processing functions. Today I had a look at how NVidias own Image processing library - NVPP - stacked up against the latest Intel Performance Primitives (IPPI v6.0) for some basic arithmetic on one of my Dev machines. This development PC has a mid-performance Quad-core Intel Q8400@2.66GHz and a mid-performance NVidia GTX260 with 216cores@1.1GHz.

The results are interesting and pretty much what I expected. As an example, here are the results for a simple image addition of two images to produce one output image (average 1000 iterations):

512x512 Pixels:
GPU-Transfer and Processing = 0.72 milliseconds
CPU = 0.16 milliseconds

2048x2048 Pixels:
GPU-Transfer and Processing = 6.78 milliseconds
CPU = 2.81 milliseconds

The CPU wins easily - so whats happening here? The transfer overheads to-and-from the GPU over a PCIex16 bus are by far the dominant factor, taking approx 2ms per image transfer for the 2048x2048 images (two input images, one image output = approx 6ms). Whilst transfer times can be significantly improved (perhaps halved) if the input and output images were put into page-locked memory, the conclusion would not change; performing individual simple image operations on the GPU does not significantly accelerate image processing.

So what happens if we emulate a compute-intensive algorithms on the GPU? When we perform only one transfer but then replace the single addition with 1000 compounded additions, the total time for the GPU operation becomes:

2048x2048 Pixels:
GPU-1xTransfer and 1000xImAdds = 0.29 milliseconds

So for a compute intensive operations which transfers the data once, then re-uses the image data multiple times, the GPU can easily be 10x faster than the CPU.

This means that algorithms such as deconvolution, optic flow, deformable registration, FFTs, iterative segmentation etc are all good candidates for GPU acceleration. Now, if you look at the NVidia community showcase then these are the sorts of algorithms that you will see making use of the GPU for imaging. When the new Fermi architecture hits the shelves, with its larger L1 cache and new L2 cache, then the GPU imaging performance should make a real jump.

Its worth mentioning a minor technical problem with NVPP and Visual Studio 2008 - NVPP1.0 doesn't link properly in MSVC2008 unless you disable whole program optimisation (option /GL). Its also worth noting that the NVPP is built on the runtime API, which is not suitable for real-time multi-threaded applications. If you really need some of the NVPP functionality for a real-world application, then we would suggest you get a custom library developed using the driver API.

Vision Experts

No comments:

Post a Comment

Welcome

Practical software & algorithm development in the machine vision industry.

Often a pretty technical blog, often just observations on being a developer, leader and human being in the Machine Vision Industry.

I'm professional software engineer, tech lead and company director at Vision Experts and Red Engine.

I spend most of my time managing a small team of world class engineers, inventing IP and consulting for industry in the computer vision space.

Websites can be found at www.flightclubdarts.com and www.visionexperts.co.uk

The Parallel Revolution

“We are dedicating all of our future product development to multicore designs. … This is a sea change in computing”
- Paul Otellini, President, Intel (2005)

“Multicore: This is the one which will have the biggest impact on us. We have never had a problem to solve like this. A breakthrough is needed in how applications are done on multicore devices.”
- Bill Gates, Microsoft

“When we start talking about parallelism and ease of use of truly parallel computers, we're talking about a problem that's as hard as any that computer science has faced. … I would be panicked if I were in industry.”
- John Hennessy, President of Stanford

Accelerated Image Processing

Friday, 26 March 2010

NVPP Performance Benchmarks

No comments:

Post a Comment

Welcome

Blog Archive

The Parallel Revolution

Keywords

About Me

Other Vision Blogs