The Best Things Constantly Change

5/1/2009

Written by

Prof. Marc Snir Discusses the Blue Waters Project with NCSA's Access Magazine

With the National Science Foundation's funding of a sustained-petascale computer system at the University of Illinois, called Blue Waters, the high-performance computing community embraces on new challenges. NCSA Access' Barbara Jewett discussed some of the hardware and software issues with the University of Illinois at Urbana-Champaign's Wen-mei Hwu, professor of electrical and computer engineering, and Marc Snir, co-director of the Universal Parallel Computing Research Center and professor and former head of computer science.

The complete interview can be found on the NCSA website at http://www.ncsa.uiuc.edu/News/Stories/HwuSnir/.

Q: A report, Dr. Snir, that you co-edited a few years ago on the future of supercomputing in the United States [Getting Up to Speed: The Future of Supercomputing, National Academies Press], indicated the country was falling behind in supercomputing. With the Blue Waters award, do you feel like we are now getting back to where we need to be?

SNIR: It certainly is an improvement. The part that is still weak is that there has been no significant investment in research on supercomputing technologies, and that is really the main thing we emphasized-you get continuous improvement in computer technology when you have continuous research.

Q: Let's talk about the software for a petascale machine. What is the biggest challenge?

SNIR: Scalability is first and foremost. You want to run and be able to leverage hundreds of thousands of processors. Wen-mei can explain it even better than I. Technology now is evolving in the direction where we can get an increasing number of cores on a chip, and therefore an increasing number of parallel processors in the machine. To be able to increase the performance over the coming years, the answer has to be that we will increase the level of parallelism one uses. And that really affects everything-applications that have to find algorithms that can use a higher level of parallelism, the run times, the operating systems, the services, the file systems. Everything has to run not on thousands, not on tens of thousands of processors, but on hundreds of thousands of cores. I expect to see millions before I retire. It's a problem.

Q: What other expertise will your respective departments contribute to this project?

HWU: One of the aspects of this machine is that we are going to build this massive interconnect. Marc actually has a lot of experience building this kind of machine, although probably on a smaller scale, when he was working at IBM. And people that make up the electrical and computer engineering department (ECE) have a lot of experience building this kind of machine. Another aspect of this reliability facet that we talked about is that Ravi Iyer with ECE has more than 20 years' experience working with IBM measuring their mainframe failure rate and their component versus systems reliability. I personally focus much more on the microprocessors. I have worked with numerous companies on various microprocessors, and one of the things I specialize in is how do you actually build these microprocessors so that the compilers can use the parallel execution resources on that chip.

SNIR: We have a lot of experience at Illinois on developing parallel run-times, programming languages, and software for high performance. The computer science department has been involved in parallel applications, and large scale applications, assisting in developing the NAMD code just a few years ago. [Editor's note: NAMD is a molecular dynamics simulation code designed for high-performance simulation of large biomolecular systems (millions of atoms). It was developed through a collaboration of Illinois Theoretical and Computational Biophysics Group and Parallel Programming Laboratory.] We've done a lot of work on multicore systems. We certainly have a strong applications team on our campus whose efforts I think we can use, as well as all sorts of professors and graduate students as we are one of the few places that teach scientific computing and teach high-performance computing. So we have the breadth.

Q: A process like this changes the state of affairs for everybody. What are some of the likely candidates for those disruptive moments we are going to encounter?

SNIR: They are likely to be while working on new programs, new programming languages, and new programming models. The big impediment to these changes is: "Will my program run everywhere? Am I willing to invest in order to write my program in languages that will not be supported everywhere?" But if it is supported by several of the topmost machines in academia they will probably make that investment, so we'll need to work with the Track 2 teams.

Share this story

This story was published May 1, 2009.