Accelerated Image Processing: Debunking the x100 GPU Myth

Thursday, 8 July 2010

Debunking the x100 GPU Myth - Intel Fights Back

Intel recently published this paper titled 'Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU' that makes an attempt to compare a number of GPU kernels with algorithms that are highly optimised for Intel architectures. The authors concluded that for the right problems, the GPU was up to 14x faster than an equivalent optimised CPU implementation. On average a x2.5 increase in speed was seen.

I am all in favour of using GPU's to accelerate image processing when it is appropriate but the hype has gotten out of control over the last year, so I am very pleased to see Intel try and put their case forward and bring some balance to the arguments.

What I liked about the paper was that for once, significant effort was expended to optimise BOTH the CPU and the GPU implementations. Too many biased comparisons are made between highly optimised GPU implementations and the naive, plain vanilla single threaded 'C' versions. When a x100 increase in speed is cited, I always suspect that the author was being either highly selective in what parts of the overall system were being timed, or that the algorithm was unrealistically well mapped to GPU hardware and not representative of a real problem, or even that the CPU implementation was simply not optimised at all. The NVidia showcase website has made publishing an impressive acceleration factor in the authors best interest.

I certainly have not come across any imaging systems that have achieved anything like x100 accelerations in throughput by employing GPU technology. There may be some algorithms that map superbly well to GPUs and can achieve x100 performance increase in a single algorithm stage, but these numbers published by Intel are much more in line with the total throughput increase I have seen when using GPU's to do image processing in real-world applications, when compared to the optimised CPU algorithms that are readily available.

An example of disengenuous performance metrics would be the image processing blur demo in the NVidia SDK - here the image is loaded from file, pre-processed and converted into a 512x512 floating point greyscale image, transferred to the GPU once, and THEN processed repeatedly at high speed to show how fast the GPU is. The CPU conversion to floating point format is omitted from the GPU compute time.

I would also agree with Intel that most often, in practice, optimisation of an algorithm to use multiple cores, maximize cache usage and SSE instructions is easier, faster and ultimately more portable than developing a CUDA replacement algorithm. I would also agree with the GPU evangelists that the hardware cost of an upgrade to a top-end Intel based PC system, vs the investement in a GTX280 is significantly higher. With the tools improving all the time, it is becoming easier to code and deploy GPU enhanced algorithms.

The conclusion is, for the time being, we must take a balanced view of the technology available and choose the right processing method to suit the application. And be realistic.

Vision Experts

2 comments:

Sung H Chung12 January 2011 at 03:22
I totally agree. There are a whole bunch of considerations when implementing in GPU, such as data segmentations to fit into shared memory, and streaming large scale data in and out of GPU RAM (considerably smaller than what's available to CPUs nowadays), optimizing memory access patterns (coalescing), making algorithms SIMD friendly, etc. It's been fun but I'd prefer 100 core systems (if they become cheap enough) even if the theoretical performance gain may be slightly less than 1000 core GPUs.

Optimizing for multi-core is no picnic either, but it almost seems like a walk in the park compared to the effort for implementing things in GPU.
ReplyDelete
Replies
gfive14 May 2013 at 11:02
Interesting blog. It would be great if you can provide more details about it. Thanks you

Image Processing Company in Chennai
ReplyDelete
Replies

Add comment

Welcome

Practical software & algorithm development in the machine vision industry.

Often a pretty technical blog, often just observations on being a developer, leader and human being in the Machine Vision Industry.

I'm professional software engineer, tech lead and company director at Vision Experts and Red Engine.

I spend most of my time managing a small team of world class engineers, inventing IP and consulting for industry in the computer vision space.

Websites can be found at www.flightclubdarts.com and www.visionexperts.co.uk

The Parallel Revolution

“We are dedicating all of our future product development to multicore designs. … This is a sea change in computing”
- Paul Otellini, President, Intel (2005)

“Multicore: This is the one which will have the biggest impact on us. We have never had a problem to solve like this. A breakthrough is needed in how applications are done on multicore devices.”
- Bill Gates, Microsoft

“When we start talking about parallelism and ease of use of truly parallel computers, we're talking about a problem that's as hard as any that computer science has faced. … I would be panicked if I were in industry.”
- John Hennessy, President of Stanford

Accelerated Image Processing

Thursday, 8 July 2010

Debunking the x100 GPU Myth - Intel Fights Back

2 comments:

Welcome

Blog Archive

The Parallel Revolution

Keywords

About Me

Other Vision Blogs