Speeding up 3-D video for computers
7/21/2016 10:12:00 AM
Most mammals have binocular vision. It helps squirrels, in the trees on campus, determine the distance between branches and make the leap. It helps basketball players toss buzzer-beaters from half court.
For computers, the same stereoscopic vision—taken with two or more offset cameras—can provide equally valuable 3-D information. Even static images can help with object recognition or, in the case of Google’s aerial maps, with creating 3-D cityscapes, complete with topography and correctly proportioned and shaded trees.
Now, the rate at which computers can extract that 3-D information is speeding up. ECE graduate student Jungwook Choi and CS @ ILLINOIS Department Head Rob A. Rutenbar have demonstrated one of the fastest video-rate implementations of this 3-D computer vision. Last fall, their design earned them top honors for the best accuracy-adjusted performance at the MEMOCODE design competition, held in Portland by IEEE and the Association for Computing Machinery (ACM).
With video-rate stereo matching, computers could recognize gestures more readily, and the technology could play an important role in the move toward driverless vehicles. Already automakers like Mercedes-Benz and Volvo have added pedestrian detection to some models, where stereo images, coupled with radar, are used to warn the driver of nearing pedestrians and—if necessary—apply the brakes.
“In such a case, speed of stereo matching is critical,” Choi said. “The faster stereo matching is done, the more chance the car can avoid the collision.”
In general though, Choi indicated that video-rate stereo matching, while highly important, is just one piece of a larger puzzle. The whole picture—the focus of his overall research—is developing customizable hardware that allows computers to interpret observations more quickly.
To do this, Choi and Rutenbar utilized a type of algorithm known as belief propagation, which, in the case of stereo matching, establishes probable guesses about the spatial depth of pixels in an image. Belief propagation is also widely used in artificial intelligence. Speech recognition, for example, often uses some form of belief propagation when choosing between homophones, interpreting accents, and so forth.
“Belief propagation methods have been researched intensively [over the past decade] and achieved huge success in practice,” Choi said. “But still, there has been a missing step between algorithmic solutions and their realization in the real world applications…mainly due to slow speed.”
Often there’s a trade-off between speed and accuracy, but Choi and Rutenbar were able to achieve both. They employed a belief propagation algorithm known as sequential tree-reweighted inference (TRW-S), which, reportedly, had never been demonstrated at video rates. These algorithms traditionally begin in one section of an image and, as the name implies, move sequentially, pixel by pixel, through the rest. It’s an inherently slow but reliable process.
To achieve video rates, the team turned to customizable hardware.
“Jungwook devised some very clever architectural tricks to expose lots of useful parallelism,” said Rutenbar, an Abel Bliss Professor. “We can be doing lots of work on different parts of the image concurrently.”
Their experimental results achieve a rate of 12 frames per second, which is significantly faster than other belief-propagation approaches, demonstrated recently.
The team used a Convey HC-1 computer system, which includes customizable integrated circuits known as field-programmable gate arrays. “Stereo matching requires a huge amount of computation memory bandwidth,” Choi explained. “That’s why people have tried to implement stereo matching algorithms on multi-cores…or graphic processors for real time execution, but they are fundamentally restricted in the way of allocating computing power and memory bandwidth.”