By Richard Goering
SAN JOSE, Calif. – In a spirited debate at the International Conference on Computer-Aided Design (ICCAD) here Nov. 6, panelists clashed over whether the move to multicore processors and parallel processing is a looming disaster or a great opportunity for the next era of computing. The debate followed a presentation by computer pioneer Gene Amdahl, whose “Amdahl’s law” predicts the performance capability of parallel computing systems.
“The focus on multicore is all about parallelism,” said Arvind, a professor of computer science and engineering at the Massachusetts Institute of Technology. “Almost all important applications today have tons and tons of parallelism in them. There is no going back. All computers in the future will be parallel.”
A very different point of view came from Patrick Madden, professor of computer science at the State University of New York (SUNY) in Binghampton. “I’m terrified,” he said. “Multicore is the end of the world as far as I’m concerned.” Asking how his mother could possibly use a processor with 100 cores, Madden argued that 75 percent of the computer industry’s revenues come from customers who will see almost no benefit from the use of multiple cores.
Looming heavily over the discussion was Amdahl’s Law, a formula that predicts theoretical performance gains based on how much of an application can be parallelized. If you could parallelize 90 percent of an application to the point where it takes zero time, Arvind noted, you would still only get a 10-fold speedup, because you’re limited by the code that can’t be parallelized.
In his own presentation, Amdahl made it clear that he hadn’t intended to develop a “law” that would be used to predict computer performance over the next 40 years. He simply developed a formula to demonstrate the performance limits of a particular parallel computing architecture during a 1967 debate. “I didn’t consider it a law, just a formula for performance,” he said. “Somebody else called it a law. Maybe you can get them to rescind it.”
In introducing Amdahl, Madden noted that it’s the 40th anniversary of Amdahl’s law. “I think most people would like Amdahl’s law to not be true,” Madden said. “It’s thrown a crank in the works into many attempts to build parallel processing machines. A handful of things can be parallelized easily, and the rest run smack into Amdahl’s law.”
The Amdahl lecture and following debate was sponsored by the Association for Computing Machinery (ACM) Special Interest Group on Design Automation (SIGDA). At the event, Amdahl received the ACM SIGDA’s 2007 Pioneer Achievement Award.
Early days of computing
In his talk, Amdahl described a personal history that had a tremendous impact on the development of computing. He spoke of growing up on a farm in South Dakota with no electricity, starting college in 1941, and receiving a bachelor of science degree in engineering physics in 1948. At the University of Wisconsin (UW) in 1950, he and a colleague worked for 30 days with a desk calculator and a slide rule to solve a problem in theoretical physics. “I decided there had to be a better way to do this computation,” he said.
Amdahl came up with the design for a computer architecture and presented it to the UW electrical engineering department. He was asked to write a detailed description of his new computer, called the Wisconsin Integrally Synchronized Computer (WISC), and it was built and completed in 1955. It had floating-point computation, pipelining, and concurrent and independent input and output, all industry firsts, Amdahl said.
After designing the WISC computer, Amdahl joined IBM, where he designed some of that company’s early computer systems. He introduced features such as fixed-point calculations, indexing, and interactive table lookups. In 1970 he left IBM and formed Amdahl Corporation, building some of the world’s first “large scale integrated” circuits with around 100 gates per chip.
Amdahl Corporation developed some of the largest capacity general-purpose business computers in the world. In 1981, Amdahl founded Trilogy, a company that was attempting to pioneer wafer-scale integration. This technology, which attempts to use an entire silicon wafer to produce a single chip, was not commercially successful.
The formula that became Amdahl’s law came from a debate at a computer conference in 1967, where Amdahl argued that super uniprocessors, not parallel processors, were the wave of the future. He developed the formula to predict the performance of the Illiac V, an early parallel computer. The formula compared the Illiac V’s performance to that of a computer that can execute the calculations only in a totally sequential manner.
Amdahl’s law effectively set an upper limit to the performance of a computer with one instruction unit and multiple execution units under operating systems available at that time. Later on, Amdahl said, “I found that many groups were using the formula to analyze their computer structure.”
Amdahl is no foe of parallel processing, however. Although retired, he’s on the advisory board of supercomputing firm Massively Parallel Technologies, a company whose literature claims to “absolutely destroy” the limitations of Amdahl’s law by providing up to a 1,000-fold speedup. “I believe the future of supercomputing will be advanced by parallel complexes of inexpensive computers,” Amdahl said.
The multicore debate
In the debate that followed Amdahl’s presentation, several panelists suggested Amdahl’s law might not be as restrictive as it seems. If you can parallelize 100 percent of a piece of code, Arvind said, and the parallel parts run twice as fast, you’ll get a two-fold speedup. It’s not so daunting to believe one could parallelize 100 percent of code, he said.
“You have only to find two or more independent threads in a large computation,” he said. “If you look at any important calculation, there certainly must be two threads.” That’s the case, Arvind suggested, with Google searches, server applications, multimedia, and games. “My belief is that applications requiring high performance always have parallelism, and there is always money available for rewriting them,” he said.
Power has become the number one design problem, and the industry is solving it by going to multiple processors, said Gary Smith, chief analyst at Gary Smith EDA. “We’ll keep going with multicore, and we have to look at heterogeneous multicore more efficiently than we’re doing it now. We need to develop a concurrent software architecture infrastructure by 2013 or Moore’s Law goes off track.”
But that infrastructure won’t be based on multi-threading, Smith said. “We’re walking away from threads and looking at library-based parallelism,” he said. “Threads are dead.” What this all means, Smith said, is that “we have to walk away from C and look at some other concurrent language.”
“Essentially, we’ve come to the end of single-threaded performance,” said Kunle Olukotun, professor of electrical engineering and computer science at Stanford University. The industry is moving to multicore platforms, he said, to keep power under control, develop modular architectures, and exploit parallelism.
The challenge, Olukotun said, is making parallel programming “the common case.” While most students are successful at sequential programming, Olukotun said that “the number of students who can write a correct and high-performance parallel program, even after taking a parallel programming course, is vanishingly small.”
The solution, he said, lies in moving to higher-level programming models. Olukotun suggested that it’s time to move away from sequential languages to domain-specific languages such as SQL or Matlab.
Olukotun also called for thread-level speculation, which enables parallelization without regard for data dependencies. This “speculative parallelism” approach, he said, uses hardware support to “take sequential regions and try to get parallelism that isn’t obviously there, but can be found speculatively.” However, it isn’t scalable; thread-level speculation would allow only a two-fold speedup with eight CPUs, he said.
Madden countered the other presentations with a skeptical view of parallel programming and multicore architectures. “The landscape is littered with companies who tried to make a go of parallel programming, and crashed and burned,” he said. “The success rate is terrible.” But this time, he said, we have no choice – we must go parallel. Quoting Stanford University president John Hennessy, Madden said, “If I were an industry, I’d be panicked.”
The fundamental problem with Amdahl’s law, Madden said, is that even if only 10 percent of an application continues to run in serial, performance gains are greatly diminished. He noted, however, that parallel programming provides some speedup and is useful in some applications, such as 3D graphics.
But 75 percent of computer industry revenues comes from desktop and laptop PCs, Madden noted. “We’re talking about my mother and people like her,” he said. What this means, Madden said, is that three quarters of computer revenues come from customers who will see little advantage to multiple cores. None of the software written for this market is parallel, and even if it’s rewritten, speedup might be only two or three fold, he said.
“When Moore’s Law stops, it will be because of money, not physics,” Madden said. “If we can’t sell PCs and laptops we’ll have a cash crunch. That’s what I’m terrified about.”
Not all computers
After the panelists finished their presentations, Amdahl said that not all computers will be parallel in the future. “I’m quite sure that if you’re an individual, you won’t want multicore,” he said. “If you’re a company, maybe you can use it.”
John Gustafson, CTO of high-performance computing firm Clearspeed, said it’s really not hard to train new programmers to use parallel programming. “In my teaching of grad students and undergrads, I found it adds only five percent to the effort of programming. Fresh students can be taught to program and think in parallel if we get to them before they’re corrupted and brain damaged by serial thinking.”
“I think parallel programming is fundamentally hard,” countered Olukotun. “I think we have to move to new languages that hide the parallelism, and the parallelism has to be wrapped in domain-specific constructs where it’s more implicit than explicit.”
“I absolutely think the day will come when MIT freshmen are taught parallel programming and nothing else,” said Arvind. “Sequential programming will be a special case where you can squeeze something out of the machine if you understand the underlying hardware.”
One audience member asked about debugging parallel programs. “The solution is deterministic replay, where you deterministically create the same set of bugs no matter what time you run the program,” said Olukotun. Arvind agreed that’s one solution, but he said the best approach is synthesizing programs so that functionality and performance can be predicted.
Another audience member asked how it will be possible to convert the huge legacy volume of sequential code to parallelism. “The solution has to be speculative thread techniques that can work on binaries, but these techniques are not scalable,” Olukotun said. “If you really want to get the performance and power the hardware is capable of, you have to rewrite your program.”
Smith observed that this very process is underway in EDA, where vendors are being “forced” to rewrite applications for parallel processing. Rewriting a large CAD application is a three-year process, he said. And threading isn’t the answer, he said, because it only scales up to about four processors.
Madden noted talk of the “thousand-core” IC and asked if his mother was going to watch 1,000 videos simultaneously. Such viewpoints are “limited by imagination,” Arvind said. “Talking about the number of videos your mother can watch in parallel isn’t the point,” he said. “She couldn’t watch even one if we didn’t do some hardware assist. If we do any high definition stuff, there’s no going back. The moment there is performance at the right place, everybody wants it.”