Recap: Using FPGA for High Speed Image Processing Part 1: Introduction

Parallelism in FPGA

By default, image processing is inherently parallel and the ability to exploit this parallelism has significant implications when building embedded vision systems. This parallelism can be observed in a number of ways like temporal parallelism, spatial parallelism etc.

Temporal Parallelism

In most image processing algorithms, the image processing steps are a sequence of operations where each step requires a separate processor. Temporal parallelism is a pipelined architecture and works similar to an assembly / production line in which every station performs a particular operation on the item being manufactured. If every successive processor has to wait for the previous processor to complete processing, there will be no reduction in the response time or the total processing time. However, the throughput will increase, if the processor 1 begins processing the next image while the second processor works on the output of operation 1.

Spatial Parallelism

To take full advantage of spatial parallelism, the image can be partitioned and separate processors can be assigned to process each partition. Partitioning can be done by splitting the whole image into row, column or rectangular blocks. The challenge with spatial parallelism is the amount of overhead needed to allocate the image partitions into each processor.

Streaming

A streamed pipelined system implemented on FPGA can be operated at the pixel input (or output) clock frequency, which is equivalent to reduction in clock speed of two orders of magnitude or more over a serial processor. From a development perspective, implementing the entire algorithm as a single stream makes it an effective approach. With stream processing, pipelining is usually required to achieve the required throughput.

Case Study: Our experience using a FPGA Processor instead of microcontroller

In our real-time experience with one of the clients in the high-speed sorting OEM industry, the available cycle time starting from the sorting decision to actuation of the pneumatic ejectors was only 3 ms.  The original design included microcontrollers that were only capable of performing simple thresholding techniques within the stipulated cycle time. This restricted the quality of inspection on the end products. We implemented an image processing algorithm with a FPGA processor exploiting parallelism to achieve more complex image processing within the same cycle time. Click here (share case study link) to read the case study in detail.

Case Study: High Speed Image Processing Using FPGA