6/30/2023 1:58:32 PM
As cloud computing becomes increasingly popular, concerns like energy efficiency are coming to the forefront. Illinois Computer Science Professor Josep Torrellas wants to make distributing computing as efficient as possible.
Torrellas’ research group has contributed three papers proposing more efficient computing architecture and hardware to the 2023 International Symposium on Computer Architecture. One of the most prestigious and competitive computer science conferences, just over 20% of the 372 submitted papers were accepted this year.
Torrellas noted that his group’s papers represent work completed in the ACE Center for Evolvable Computing.
“The center is working with industry partners to come up with more energy efficient distributed computing systems,” he said. “Not just processors, but also memories and networks. It’s been very rewarding to work on something that will make such an impact in industry.”
The first paper, “μManycore: A Cloud-Native CPU for Tail at Scale,” introduces a multicore processor design that will speed up microservice workloads. Such workloads are common in cloud computing and website access, according to Torrellas.
“When you go to book a hotel on the web, for example, the interactions you have are based on small tasks called microservices,” he said. “You pick the location where you want to find a hotel. You set the price range. Then you make the booking, confirm it, and make the payment. Each of these tasks is a microservice.”
In this environment, even if most of the microservices are fast, a small number of inefficient microservices limit the entire application. Torrellas’ group proposed the μManycore processor architecture designed to mitigate this phenomenon, called “tail latency.” Unlike standard processors designed to minimize the average processing time across all tasks, the new architecture optimizes what Torrellas called the “hotspots” where microservices can potentially slow down, increasing tail latency.
The second paper, “MXFaaS: Resource Sharing in Serverless Environments for Parallelism and Efficiency,” presents a framework for efficiently implementing the serverless environments commonly provided by cloud computing platforms, according to Torrellas.
“Serverless means that the cloud gives you everything you need to execute your program, and you do not have to worry about providing libraries and other support code,” he said. “In this environment, programmers invoke the same program many times to exploit parallelism. With each invocation, new resources need to be allocated.”
With MXFaaS, the researchers showed it is possible to securely combine the resources needed for different invocations of the same program. This results in a highly energy-efficient execution.
The final paper, “SPADE: A Flexible and Scalable Accelerator for SpMM and SDDMM,” presents a design for specialized computing hardware, or an “accelerator,” tailored to sparse matrix multiplications. These computations are common in machine learning applications, but they are inefficient on standard hardware because they require many memory accesses. These accesses cause the computer to waste time and energy.
SPADE is a hardware accelerator that is designed to execute these types of challenging operations efficiently. To prove their concept, the group built a small chip for such an accelerator.
This year, the ISCA conference celebrates its 50th anniversary. As part of the celebrations, a panel selected the highest-impact papers of the last 25 years of the conference, and the authors were asked to write a retrospective on the paper. A paper that the Torrellas group presented at the 2006 ISCA, “Bulk Disambiguation of Speculative Threads in Multiprocessors,” has been selected for the collection “Retrospective of ISCA 1996-2020.”
Their selected paper presents a scheme to improve speculative processing in parallel computing. “The idea of speculation is to do more work than is needed right now,” according to Torrellas. “You may have to throw that work away later, but it may also turn out to be quite useful and advance the computation.
“If you perform this speculative work in parallel, then there is an added layer of complexity. As each processor executes part of the program, they need to check that they did not step on each other’s toes: namely, that they did not access the same memory locations. This is a time-consuming process if done manually. Our solution was to have each processor generate, in hardware, a summary of the memory locations it accessed. This information is stored in a ‘signature.’ Then, instead of manually comparing the locations accessed by the different processors to ensure they are different, we can simply check whether their signatures overlap.”
Torrellas noted that their technique was influential to multiprocessing architectures at the time and formed the basis of some patents filed by prominent manufacturers