400GE is on the Horizon in the Data Center
2018-08-30Today's society demands instant access to information. Armed with smartphones, we have mobile access to information from virtually anywhere, anytime. Fifth generation wireless (5G) promises to be more than 100 times faster than 4G and will enable new technologies such as virtual reality, autonomous vehicles, and the Internet of Things (IoT). Telecom service providers are furiously upgrading their wireless infrastructure to be ready to support 5G's faster speed, higher bandwidth, and lower latency. Current 100 Gigabit Ethernet (GE) data center speeds will not be fast enough to handle the computing and performance demands of 5G. Next-generation 400GE transceivers are the solution.
The Shannon Hartley theorem states that there is a theoretical maximum amount of error-free data that can be delivered over a specified channel bandwidth in the presence of noise. To deliver higher data rates, either the channel bandwidth or the number of signal levels must be increased. Non-return-to-zero (NRZ) and four-level pulse amplitude modulation (PAM4) are two modulation technologies that can enable 400GE.
NRZ is the most common signal modulation scheme for 100GE today. It is a two-state transmission system (also referred to as two-level pulse amplitude modulation or PAM2) where positive voltage represents a logical "1", and an equivalent (generally) negative voltage represents "0". 100GE requires four lanes of 25 gigabits per second (Gb/s) NRZ modulated signals. Since NRZ has gradually evolved over the last 50 years, with improved speeds from 110 bits per second to 100 Gb/s, many new concepts and challenges have already been researched and addressed. By applying these same concepts, using eight lanes of 56 Gb/s NRZ signaling to move to 400GE is a logical evolution. However, as speeds of NRZ designs increase above 28 Gb/s, channel loss becomes a limiting factor. Next-generation optical transceivers need to use revolutionary advanced modulation and coding techniques to reach 400GE.
PAM4 and FEC enable 400GE
PAM4 signals use four amplitude levels with bits 00, 01, 10 and 11 to represent a symbol. The number of symbols transmitted per second (baud rate) is half the number of bits sent per second. For example, a data rate of 28 gigabaud (GBaud) means there are 56 gigabits of data transmitted per second. A 28 GBaud PAM4 signal provides double the data rate (throughput) in the same bandwidth as a 28 GBaud NRZ signal where one bit represents one symbol.
This increased data throughput comes at a cost. PAM4 designs are far more susceptible to noise since an amplitude swing of two represents four signal levels. In PAM4 transceiver designs, the signal-to-noise ratio (SNR) is lower, making noise analysis much more complicated than with NRZ. Testing needs to account for channel return loss, as well as noise from the test instrumentation. Forward error correction (FEC) is used to improve link integrity and counteract physical layer level errors introduced by reduced SNR in PAM4 signals.
FEC is an advanced coding technique that sends the required information to correct errors through the link along with the payload data as shown in Figure 1. The decoder uses this information to recover corrupted data without the need to request the transmitter to retransmit it. Both the transmitting and receiving ends of the link must know which coding scheme is being used for the link to operate. Links employing FEC use a variety of coding systems. The more common coding schemes used in data center networking are variants of the Reed Solomon (RS) system, initially developed in the 1960s by Irving Reed and Gustav Solomon for use in satellite data links.
Figure 1 Testing PAM4 signals with FEC
Test Implications of FEC
Physical layer testing of PAM4 signals must account for new test challenges introduced by FEC. With 400GE, naturally occurring errors in the system are "acceptable" to a certain level and then corrected with FEC, resulting in a nearly error-free environment post-FEC. However, there are three key considerations that transceiver manufacturers will need to consider when testing FEC encoded PAM4 signals; coding gain, burst errors, and striping.
Coding Gain
The encoding process converts the payload data to a format to allow decoding and creates the additional data required to correct errors. Code words are the result of the encoded data. Decoding is necessary on the receiving end to recover the data. Coding gain is a figure of the robustness of the error correction code. Higher coding gain allows the correction of a higher number of errors. However, there are tradeoffs. RS systems using higher coding gain require sending more overhead in the block of code words to facilitate decoding at the receiving end. Also, increasing the coding gain increases the amount of logic needed for coding and decoding, and the processing time, or latency, required to encode and decode the data. FEC with a higher coding gain is necessary for high-speed serial data links using PAM4 that have a higher native error rate than those using NRZ line coding.
A given coding gain in an RS system can correct up to a defined maximum number of errors in a code word. Frame loss occurs past this limit. Frame loss ratio (FLR) is the measure of the percentage of frames not delivered, divided by the number of frames sent. The FEC coding gain is selected to avoid frame losses given the target worst-case bit error ratio (BER) of the link.
Burst Errors
FEC works on the assumption that the error distribution in the link is approximately random. A large burst of errors which exceed the number of correctable errors in the frame will result in frame losses, even if the average error rate in the link is better than the specified native BER. Note that a "burst" in this context is not necessarily consecutive bits. The errors could be interspersed with correct data bits and would still result in a frame loss if they exceed the maximum number of correctable bits for the FEC code. Error bursts are unavoidable and unpredictable. They can originate at the receiver end of the link, or anywhere within the channel. Data striping can be used to minimize the impact of them.
Striping
Data striping is often used to lower the incidents of frame losses in links employing multiple lanes operating at a sub-rate of the total link data rate. Striping the data rotates the individual data streams through all the available lanes in the link in a round robin fashion. By striping, burst errors generated by pass-through re-timers within the link will have the length of an error burst effectively divided by the number of lanes used for the striping. For example, in a 100GE link using four lanes of 25.78 Gb/s NRZ data, an error burst of 100 bits generated in an optical module on a single lane would result in only 25 errors on that single lane using striping. While striping does not increase the computed coding gain, which assumes random error distribution, it effectively increases the gain when error bursts occur.
400GE is on the horizon
Next-generation optical transceivers using PAM4 modulation with FEC encoding will enable 400GE in the data center. Transceiver manufacturers need to change their testing procedures to take PAM4 and FEC into account. Different metrics for performance characterization and compliance with standards are also needed. The ideal test solution will need to look beyond the physical layer and include testing FEC at the networking protocol layer. The right test tools, that provide test automation capabilities designed in accordance with industry standards, can help transceiver manufacturers accelerate innovation and enable 400GE in the data center as fast as possible.
Nicole Faubert
Data Center Infrastructure Solutions Lead at Keysight Technologies
Steve Sekel
400G/800G solutions specialist Data Center Infrastructure group at Keysight Technologies
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.