Error handling in serial communication designs
Error handling in serial communication interfaces is primarily about detecting the corruption in data due to noise in the transmission lines when the data is…
Error handling in serial communication interfaces is primarily about detecting the corruption in data due to noise in the transmission lines when the data is transmitted from the source to destination. Error detection and handling improves the reliability of the data on unreliable transmission lines.
Some of the errors detected are correctable at the hardware level. Typically such corrections are done through encoding schemes at physical layer level. Error correction here refers to cases where the receiver itself is able to correct it without having to consult the peer. Typically physical layer encoding based error correction can correct errors up to certain predefined number of bits.
This leads to some of the errors that are not correctable at receiver. They require recovery through the retransmission handled by the higher-level protocol layers above physical layer implemented in hardware.
Some of the errors are neither correctable or recoverable at hardware level. These types of errors are detected and reported to software. Software handles the necessary recovery. Reporting to software is handled through set of status registers and interrupts.
The last category of errors that lead to undefined state of the system requiring a physical hardware reset to recover out of it.
Error handling implementation
Error handling implementation in hardware primarily consists of all or some of the following:
- Error detection logic
- Error correction logic
- Error recovery logic
- Error reporting logic
Verification of the error handling in serial communication is one of the challenging area of functional verification.
Application interface refers to register and data read/write interface to higher-level software.
Physical line interface is external interface off the chip connecting to another chip.
Error detection logic detects the error. Detection can be based on static pre-defined information or dynamic state information. Undefined protocol data unit type detection is static predefined type of detection. Invalid response or missing response type is state based type of detection. Decoders and timeouts are primary components of error detection logic.
Error correction logic is mostly limited to physical layer encoding schemes. At higher-level layers there is not much of independent error correction involved.
Error reporting logic is typically made up of set of registers and interrupts. Depending on the complexity of error handling different level of granularity of reporting and interrupt generation mechanism is implemented. Group of status registers provides the report on different type of errors detected. For some of errors the transaction causing error is also stored as part of reporting. This helps software with the further diagnosis of the error. Group of mask registers control which errors lead to interrupts. When the total number of errors detected are more than 10 in number they are often grouped in smaller number of categories based on the action for handling them.
Error recovery logic is typically implemented as FSM. In most cases it involves some form of retransmission, re-initialization or resets of different severity based on the nature of the error. Re-transmission is simplest form of recovery where the set of protocol data units not acknowledged are retransmitted. Re-initialization helps re-evaluate any change in the conditions of the physical layer. Reset is more severe form recovery. It can range from just the link reset to entire chip reset.