Debugging: The clues
Debugging is like being detective. Debugging is iterative process of using the following clues to close on one of the suspects. Error messages in the log…
Debugging is like being detective. Debugging is iterative process of using the following clues to close on one of the suspects. Error messages in the log files acts as clues.
Error message is result of check failure. There are three broad categories of checks in test bench and BFM. Accordingly there are three different types of error messages relating to them. It’s the error message, which acts as first clue to start the debug process.
Term event below is used to mean any form of information exchange.
Three different error message types are, error messages from
- Immediate check failure resulting from event in test bench or BFM or DUT
- Timeout check failure waiting for event in test bench or BFM or DUT
- Global watchdog timeout check failure waiting for the end of test
Ideally failures in the third category are sign of inadequate checks in the test bench and BFM. Price for this weakness is increase in debug complexity.
1. Immediate check failure
This is check failure immediately follow an event. Like checks done after receiving the packets. This category of failures provide clearly clue about mistakes. Check failure message calls out the expectation from event and what was actual event. This can ease the debug significantly. Sometimes this type of failures can be direct point to bug in the design. For example consider BFM flagging CRC failure in the packet received from DUT. Assuming BFM has clean CRC logic, it’s directly pointing at incorrect CRC implementation inside the DUT.
In spite of clarity of direction, it’s still advisable to check the configuration and stimulus for their correctness. For example from configuration check one may find out CRC was disabled in DUT but BFM was not configured for it.
Before filing the DUT bug:
- Check if the configuration is legal,
- Check if the stimulus is as per the specification
- Check response is correctly detected by the BFM
2. Timeout check failure
This failure is not result of immediate event. This is result of some event in the past. For example, consider a stimulus generation, which has to wait for response before generating the next stimulus. When the response never turns up the timeout check failure error will result. This check is failing for stimulus event generated in the past.
Ideally every wait should be covered with the time out whether it’s required by specification or not. Because any wait can potentially become infinite wait. As additional safety measure also put a debug verbosity print indicating what event is being awaited.
On the timeout check failures waiting for event things do before filing the bug are:
- Check if the timeout value configured was correct, most of the times shorter value leads to false timeouts. Too large value of timeout leads to wastage of simulation cycles.
- Sometimes it’s possible the event being awaited happened before the wait thread started, look for it in the logs and waveforms.
- Check if the stimulus provided was timed correctly
3. Global test bench watchdog timeout
Ideally only the timeout due to end of test (EOT) conditions should be covered this timeout. EOT is made up of end of stimulus generation and end of stimulus execution.
Timeout for “end of stimulus generation” should be implemented in the stimulus generators or test. When its not done it will be caught by watchdog timeout. Penalty is longer simulation time to failure and harder debugs.
Timeout due to “end of stimulus execution” is right candidate for this timeout. Any end of stimulus execution, which involves multiples interfaces, may not be possible to predict and set up specific timeout. This type of waiting for settling down of multiple interfaces interaction can be caught by this time out. For example, waiting for scoreboard to signal the end of test.
On the watchdog time out failures things do before filing the bug:
- Check if the timeout value configured is sufficient. As the development progresses this timeout value will have to grow
- Check if the timeouts are due to end of stimulus generation. This can be done checking for expected stimulus from test and cross checking in the log files if the specified stimulus is generated. If not look for the wait condition in the stimulus generation sequences, which are not guarded by timeout. Add the time out and follow steps suggested in Timeout check failure waiting for event debug
- If the stimulus generation has competed. Timeout is due to stimulus execution buckle up for hard ride. Take it step at a time. Understand the test intent and the DUT responses expected for the stimulus provided. Check them one by one to see where it has lost the link in the chain. These failures in well-architected test bench will be multi interface interaction issues. Adding specific timeout checks may be complicated and may not have sufficient ROI. If a check can be added to prevent the pain of debug go ahead and add it in the appropriate test bench component.