10/15/2018 4:23:35 PM
The NSF has awarded a $1.2 million, four-year grant to Professors Josep Torrellas, Laxmikant V. Kale, and David Padua to work on ways to make existing systems consistently more efficient, which they believe could exponentially improve performance of large servers and data centers. Torrellas is the principal investigator for the project, for which the three will collaborate with chip-maker Advanced Micro Devices.
The bottom line, Torrellas said, is CMOS-based computers use energy very inefficiently, for a variety of reasons.
Data has to be gathered from the memory, other chips, or the disk, requiring far more energy than would be used if all operations were performed locally on the processor, he said.
Second, many operations use an approach called speculation that consumes a lot of energy and, in many instances, are just plain useless, according to Torrellas.
“Let’s say a program executes, and hits a branch. The program needs to check a condition to decide which way to go. But processors do not wait to figure out the outcome of the condition before moving on because it’s going to take some time,” he said. “Instead the processor follows one path. Then, if that’s wrong, the hardware scraps everything, and goes the other way.”And finally, using his own office laptop as a small-scale example as it idled nearby, “This guy is consuming a lot of energy not doing anything,” Torrellas said.
“So the combination of these three things make a computer orders of magnitude less efficient than it should be.”
Torrellas is the Saburo Muroga Professor of Computer Science and his research focuses on architecture, compilers, and parallel computing.
He and his collaborators are taking a four-pronged approach to the project:
- Adapting control theory – which is used across engineering to analyze systems and create feedback loops used to control those systems -- to computer architecture.
- Developing hardware controllers.
- Extending runtime systems to actuate on the hardware.
- Using compilers to generate code that is optimal for a highly-controlled cluster or datacenter.
Control theory, Torrellas explained, presents an early unknown in the project. It is often used in aeronautical engineering and mechanical engineering.
“How can it be applied to computers? It is an open question,” he said. “Computers are very complicated; building a model of a computer is very hard.”
The key, he said, will be to take what he calls an integrated, cross-layer approach to the work.
“There’s hardware, the operating system, the compiler, the application,” Torrellas said. “All these layers, if you optimize them together, you get a higher impact.”
Torrellas will focus on the hardware.
“You’re going to have a large machine built out of nodes, each node will have multiple chips, each chip will have multiple cores,” he said. “You have to build in a hierarchical manner. You control the core, then you control the chip that has multiple cores. Then you have a node that controls multiple chips, and so on.”
Kale, meanwhile, will work on the runtime system. Kale is the Paul and Cynthia Saylor Professor of Computer Science and leads the Parallel Programming Laboratory. His past work includes pioneering research to develop adaptive runtime systems in parallel computing.“Power, energy, and temperature are becoming very important issues in extreme-scale computing, even as they are important on cellphones and laptops,” Kale said. “It is an exciting prospect to combine our work on adaptive runtime systems with controllable adaptive hardware at the lower level and compiler support at higher level, via a common framework provided by a control system.”
Padua is a Donald Biggar Willett Professor in Engineering and his research focus includes optimization strategies. He will focus on designing compilers.
The three professors also plan to compare the use of control theory to improve system efficiency with the use of machine learning.
“Some people are trying to make the systems more efficient by using machine learning -- an algorithm that’s telling me ‘shut this hardware down now’ or ‘move this program here, change the frequency,’ as opposed to control theory, which is based on a more reactive approach,” Torrellas said.
Torrellas, Kale and Padua also believe the project could serve as a catalyst for interdisciplinary research and education on clusters and data-center technology at Illinois.