Accelerated Image Processing: July 2009

Saturday, 4 July 2009

CUDA function overheads

Whilst working on my CUDA accelerated JPEG algorithm I found a problem with my design which demanded launching a large number of small kernels followed by many thousands of small memcopy operations. I was launching kernels to compress a fixed number of image blocks, many hundreds in all. The result was compressed image blocks, and the output size was only known at runtime after the algorithm was finished, but required many thousands of mem copy operations. The design was bad, but I was trying things out to see what would happen.

On a CPU, a function call will typically take a few nanoseconds to push parameters on the stack and jump the program pointer to the function address. On the GPU however, much more work has to be performed via the driver. So kernel launches and cuda mem copy operations take at least three orders of magnitude more to setup than a CPU call - several microseconds in all.

This means that if you want to perform many hundreds or thousands of calls then the function calls themselves can start to add up much more quickly than the equivalent CPU calls. This effect can then become significant - so make your kernels big!

Welcome

Practical software & algorithm development in the machine vision industry.

Often a pretty technical blog, often just observations on being a developer, leader and human being in the Machine Vision Industry.

I'm professional software engineer, tech lead and company director at Vision Experts and Red Engine.

I spend most of my time managing a small team of world class engineers, inventing IP and consulting for industry in the computer vision space.

Websites can be found at www.flightclubdarts.com and www.visionexperts.co.uk

The Parallel Revolution

“We are dedicating all of our future product development to multicore designs. … This is a sea change in computing”
- Paul Otellini, President, Intel (2005)

“Multicore: This is the one which will have the biggest impact on us. We have never had a problem to solve like this. A breakthrough is needed in how applications are done on multicore devices.”
- Bill Gates, Microsoft

“When we start talking about parallelism and ease of use of truly parallel computers, we're talking about a problem that's as hard as any that computer science has faced. … I would be panicked if I were in industry.”
- John Hennessy, President of Stanford

Accelerated Image Processing

Saturday, 4 July 2009

CUDA function overheads

Welcome

Blog Archive

The Parallel Revolution

Keywords

About Me

Other Vision Blogs