Designers highlight challenges of high-speed I/O

By Bill Murray

11/14/07

SAN JOSE, Calif. – Today’s computing systems are migrating rapidly from parallel bus I/O to serial I/O interconnects, and such interconnects will be key enablers of the terascale computers that will process trillions of instructions per second, according to speakers at a “Designers’ Perspective” session at the International Conference on Computer Aided Design (ICCAD) here on Nov. 7. Speakers discussed the complexity of high-speed serial I/O design, along with the demands that it imposes on EDA tools and models.

Nasser Kurd, senior principal engineer and technical lead for clock and I/O design at Intel, began the session with an overview of why high speed serial interfaces are necessary. Referring to the coming era of terascale computing, he said that such high-performance systems will enable advanced recognition, mining and synthesis (RMS) applications, going well beyond the 3D and video processing capability of current multicore systems.

Clearly, Kurd said, I/O bandwidth must support this trend. Serial I/O interconnect technology can exceed ten gigatransfers per second (GT/s) compared to the (approximately) 2.5 GT/s limitation of the traditional single-ended multi-drop bus. Kurd reported that Intel demonstrated 20 GT/s at the International Solid State Circuits Conference (ISSCC) 2006.

How does the serial I/O approach achieve this performance? Essentially, serial I/O architectures can overcome the signal integrity issues associated with the single-ended bus. The absence of stubs alone reduces reflections, while differential signaling can reduce cross-talk and improve common mode rejection.

Moreover, serialization can reduce pin and line count. Navraj Nandra, Director of Marketing for Mixed-Signal IP at Synopsys, gave some examples. “If you compare 64-bit PCI or PCI-X, it’s about 108 pins if you want to run about 8 Gb/s. With four lanes of PCI Express, you go down to 19 pins.” Nandra demonstrated the reduced line count benefit using parallel and serial Advanced Technology Attachment (ATA/SATA) connectors as an example (see figure 1).

Figure 1: Parallel and Serial
Advanced Technology
Attachment (ATA/SATA)
Connectors
(Source: Synopsys Inc.)

High speed serial I/O can also reduce power consumption. Nandra compared the 1,800 mW power consumption of a 10 Gigabit Media Independent Interface (XGMII) configuration with the 600 mW consumed by a 10Gbe Attachment Unit Interface (XAUI) at a similar speed.

Of course, serial communications architectures have their own design requirements. According to Kurd, the high bandwidth of serial I/O requires enhanced clock and data recovery schemas, and accurate per-pin alignment, as well as equalization to compensate for frequency-dependent losses. Moreover, the communication channel must have fewer discontinuities and less attenuation, he said. Also, the elevated operating frequency narrows the bit cell period and consequently tightens the jitter budget. And, although differential signaling and equalization improve the signal-to-noise ratio (SNR), the SNR at higher frequencies is nonetheless small enough to mandate the use of high sensitivity input receivers.

Many of these design challenges are analog. According to Kurd “paying attention to analog transistor characteristics is critical in designing serial I/O.”

Richard Ward, a designer of serializer/deserializer (SerDes) ASIC at Texas Instruments, focused on the seriousness of the channel crosstalk challenge.

On the channel side, said Ward, the issue is the problem of “hardware from vendor A talking to hardware from vendor B, and how we do those simulations with a number of different companies and systems.” Applications can cover the range from a 7 mm serial communications link on a multichip module (MCM) to a 15 meter link across a cable.

Using an example of a backplane Ethernet (IEEE 802.3ap) application delivering 1 Gbits/s and 10 Gbit/s over a printed circuit board, Ward described two crosstalk situations – far-end (Fext) and near-end (Next) crosstalk (see figure 2).

Figure 2: Far-end and near-end crosstalk (Source: Texas Instruments Inc.)

Far-end crosstalk occurs when a transmitter affects the signal received by the receiver at the far end of an adjacent channel. Near-end crosstalk occurs when a transmitter affects the signal received by the receiver at the near end of an adjacent channel. Under some circumstances, the near-end crosstalk signal can be stronger than the transmitted channel (Thru) signal (see figure 3). The differential gain, Sdd, of the Thru and Fext signals declines at higher frequencies, while that of the Next signals increases.

Ward introduced a rough metric for measuring the extent of such crosstalk – the Insertion Loss to Crosstalk Ratio (ICR), where the insertion loss is the decrease in transmitted signal power along the channel under test, and the crosstalk is the sum of all crosstalk power in the channel under test. He offered a practical tip: “Keep it to more than 12 db separation.”

One challenge, according to Ward, is that “the data that we receive [from customers] is really a subset of what we really need. We’re getting a lot of single-ended port data, leaving many holes in the S-parameter (scatter parameter) matrix.” In other words, there is insufficient data to estimate crosstalk accurately.

Synopsys’ Nandra outlined a “pre-emphasis” approach to improving SNR (see Figure 4). The extreme left side binary eyes show 2.5 Gb/s and 5 Gb/s transmitted signals, using PCI Express. The adjacent binary eyes show the condition of the signal received over 26 inches of FR4 board material. They have clearly degraded, with the loss increasing with increasing frequency. Thus 1-0-1-0 patterns (AC signals) suffer increasing distortion, while a pattern of all-ones or all-zeros (DC signals) suffers much less distortion. Pre-emphasis increases the amplitude of the higher frequency signals with respect to the lower frequency signals, and thus improves the SNR. Alternatively, Nandra said, the lower frequency signals may be de-emphasized.

Figure 4: Pre-Emphasis Improves Signal-to-Noise Ratio (Source: Synopsys Inc.)

Clemenz Portmann, a design consultant, highlighted another challenge – the completeness of the specification. For example, the XAUI transmitter (Tx) and receiver (Rx) specification occupies six pages of the 2,700-page XAUI specification.

“It’s about three tables, two figures and two pages of text,” said Portmann, “and what really counts is another section of the spec that talks about post-production measurements. And this is what your customers are actually going to beat you up about when they evaluate your part or compare it to someone else’s.” According to Portmann, the key measurements on the XAUI side are jitter tolerance, input sensitivity and s11.

CAD Challenges and Standards

Intel’s Kurd highlighted the accuracy requirements of high speed serial I/O design. “Most of the chip-level CAD design tools today are digital-centric,” he observed. “High speed serial I/O design requires tools that can accurately analyze and predict timing that goes way beyond simple minimum and maximum analysis.” The tools must deal with multiple clocks, and the accurate power supply grid analysis necessary for accurate analog power estimation, as well as be able to simulate parasitic and noise coupling over large sections of the design.

Also, said Kurd, “mixed analog and digital validation and simulation are critical. These simulations must be performed at the top level – the entire I/O interface – and this is gated by machine size. To extend mixed-signal validation to the platform level, distributed computing may be the only way to handle the data volume and to generate meaningful and accurate results.”

According to TI’s Ward, a popular channel analysis approach is time domain simulation combined with jitter tail extrapolation. “Spice-based simulation works up to about 5 Gb/s. But once we get up to about 10 or 25 Gb/s, to achieve bit-error rates in the 10-18 range, we need to do 5 to 10 million bit simulations or even more, which is really out of the realm of Spice-based simulation. Also, many systems are asynchronous, so the crosstalk is asynchronous to the data – a comprehensive simulation is necessary to ensure that the crosstalk doesn’t slip through the binary eye.”

Because of the inadequacy of Spice for this purpose, many SerDes vendors have created proprietary channel simulation platforms, said Ward. Consequently, validating interoperability between designs from different vendors is a challenge. There are now two embryonic standards: IBIS 4.1, which allows the use of multilingual analog/mixed signal models, and IBIS-Advanced Technology Modeling (IBIS-ATM). Ward said that the concerned parties are in convergence discussions, plus the choices are also driven by specific customers.

Clemenz Portmann showed the diversity and complexity of simulations necessary to verify the functionality of a XAUI receiver design. “To simulate this and to make sure that it works, and that the downstream digital guys are happy, you need a bunch of overlapping simulation environments.” The simulation proceeds in four basic steps (see figure 5).

Figure 5: XAUI Verification Requires Several Overlapping Simulation Environments (Source: Clemenz Portmann)

Firstly, as shown in the red box, the team must create a package model and a channel model that model the behavior of the system into which the receiver will be integrated. This task normally leverages 3D field solvers and some simulation.

The next step – and, according to Portmann, the most time-consuming of the simulations – is the simulation of the mixed-signal section of the receiver (blue box). Typically, this requires tools such as Cadence Design Systems’ Spectre and Synopsys’ HSpice. “The difficult part is that you can’t start simulating the whole circuit all at once. You have to break it up into pieces.”

The third step is to ensure that the design outputs can be used by the downstream system. This step typically uses switch level simulators, such as Synopsys’ HSim or Cadence’s Ultrasim (green box). This is followed by a large RTL functional simulation (black box) “to ensure that mission-critical traffic operates as required and that the back-end of the chip is working with the system.”

Synopsys’ Nandra stated that “Spice is the simulator we all use, but it [the simulation] is becoming hugely complex.” He observed that “pre-layout simulations are really becoming a poorer predictor of performance. You really need to do post-layout simulation, even though it takes a long time, even on a circuit of only four hundred transistors.”

Wrap-Up

The major take-aways from this session are that high speed serial I/O is a key enabler of future high performance systems but that the successful design of such I/O circuits is a complex challenge. It requires thorough simulation across digital, analog/mixed-signal, package, board and system domains. And design team expertise must span these domains, too.

Volis Written by: