Sunday 16 May 2010

GPU Accelerated Laser Profiling

Laser Profiling extracts a dense set of 3D coordinates of a target object by measuring the deviation of a straight laser line as it is swept across the target.  Many of these systems make use of custom hardware (e.g. Sick IVC3D) and an FPGA to achieve high line profile rates, often achieving multiple thousands of profiles per second. 

It is also possible to assemble a laser profiling system using any high speed camera and a laser line.  Partial-scan cameras can be useful to get high frame-rates but some fast software is also required to find and measure the laser line position in every image to sub-pixel accuracy for every profile.  These positions are then converted to world coordinates using a calibrated projection and lens distortion correction - which requires some floating point operations.  The hardware solutions typically manage several thousand profiles/sec, software is normally slower.  

Recently, I've been experimenting with GPU accelerated line profiling - and its looking fast.  The GPU turns out pretty well suited for measuring the laser lines in parallel since we can launch a single thread per column of the input image.  In fact, for memory access efficiency, it is better for each thread to read a 32-bit int that packs four 8-bit pixels.  A block of 16 threads therefore computes the laser positions for 64 pixels in parallel.  With multiple blocks of 64 pixels running concurrently (Figure1), the processing rate is pretty much only limited by GPU-host transfers.  On my test rig, the GTX260 GPU has 216 cores, so can execute 3,456 threads in parallel, way more than are actually needed and so many are idle in my current implementation.


Figure1.  Each thread scans four columns in order to compute the position of the laser line in that column.  With each GPU core executing 64 threads in parallel, this can be very fast.


Figure2. The C# test application (using our own OpenGL 3D display library) was able to achieve over 200MPix/sec throughput using Common Vision Blox images.  The lower-level C interface was double that speed when using RAW image data.   

My initial results show that the C# interface is able to achieve about 200MPix throughput (Figure2) - but that uses Common Vision Blox images which must be unwrapped and marshaled to the 'C' dll and slows things down.
The low-level 'C' dll library was achieving >600MPix/sec throughput (Figure3)- thats many KHz for a range of resolutions.  It may be that this GPU accelerated algorithm is able to provide line rates that previously only hardware could achieve.

Figure3. The low level DOS test application with 'C' dll interface was able to achieve over 600MPix/sec throughput using pre-loaded raw images.  That was 2.5KHz profile rate on 1280x200 laser images, or 390fps for 1280x1280 scans.


Vision Experts

No comments:

Post a Comment