Debugging : The usual suspects
Debugging is like being detective. Debugging is iterative process of using the clues to close on one of the suspects. Error messages in the log files…
Debugging is like being detective. Debugging is iterative process of using the clues to close on one of the suspects. Error messages in the log files acts as clues.
Goal of debugger is to find those mistakes which are manifesting as a failure. Although not all the mistakes can be annotated following are three major categories of the mistakes. These are the usual suspects. Closing on culprit among suspects using the clues is the goal of the debug process.
Following are the three broad categories of the mistakes or the usual suspects:
1. Misunderstanding of requirements
Misunderstanding of the requirements can lead to mistake in DUT, test bench or bus functional model implementation. Misunderstandings in one of the design or verification area will result in failure.
The misunderstandings could be simple byte packing order in a packet or complex behavior during a corner case error recovery scenario. If there is same misunderstanding in both design and verification it will not result in error. That is reason why there is emphasis on keeping design and verification teams separate.
Misunderstanding of requirements apart from resulting in incorrect implementation can also result in missing or partial implementations.
Many a times at the start of the development not all the possible cases are thought out. Only some of them are implemented. As the development progress, they are rediscovered again through painful debugs.
Misunderstanding requirement can manifest in many forms. Some of them are incorrect constraints, a flag set and forgotten to reset, inconsistent updates to a data structure, missing condition and incomplete understanding of possible cases, one extra or one less iteration etc.
Interactive debuggers bundled with the simulators are also very useful in debugging this type of errors.
Sometimes, misunderstandings in the requirements have to be resolved through the discussions between design and verification teams. Resolutions should be viewed from the point of view of the how it affects the final application. Resolution in case of ambiguities should help the end application to meet its objective.
2. Programming errors
Bulk of failures are contributed by this type of mistakes. It’s close to impossible to annotate all the programming mistakes. It could be as simple as incorrect data type usage leading to data loss, which may be simple to spot. Others can be premature termination of threads, which may almost seem like well planned conspiracy against developer.
Programming errors are due to misuse of the language constructs, verification methodologies and reusable components. Current popular HVL like System Verilog has LRM spanning over five hundred pages. It takes long time to master it. System Verilog is a HVL built on HDL with OOPs support poses its own challenges in understanding when HDL domain constructs interacts with HVL domain constructs. For example, System verilog threads concept is from HDL world and does not behave in OOPs friendly way.
HVL programming also involves dealing with concurrency and notion of time. So even simple programming such as setting a flag variable, is no longer just about setting a flag, it should be set at right time by the right thread. Add to it another dimension of the object-oriented programming. Dynamic objects getting created and destroyed. Setting it in right time using right thread in right object. Too many rights make it difficult it get it right.
Current popular verification methodology such as UVM has more than three hundred reusable classes to digest. It’s certainly not easy to master these. Concepts like phasing become complicated due to legacy phasing and new phasing concepts operating together. Some concurrent, some bottom up, some top down can only make one fall down.
Most of the code written is by copy and paste. That’s because lot of it is just boilerplate code. This also increases the chances of mistakes, which are hard to notice.
Incorrect usage of reusable verification components is another source. Insufficient documents and examples for the reusable code makes reuse highly buggy in nature.
Even when there is programming error, it does not jump out as programming error. It’s hidden behind layers of the translations.
Thought process for verification engineer starts with the understanding the application world. Application world is abstracted to the test benches. Test bench implementation is mapped to verification methodology base classes and HVL code. Now there are series of the transformation have taken place.
Debugger will have to peel these layers one by one to discover the issue. It requires one to map the problem symptom showing up at different level of abstraction to programming mistake deep buried somewhere.
Typically programming error debugging can be done effectively with the interactive debuggers provided by simulator. Those allow the classic software debug environment such as ability put breakpoints, single stepping, being able to see variable values, object contents, active threads visualization etc.
Also simulators provide switches that can dump additional debug information to provide insights into problem. For example incorrect constraint usage failures are assisted by providing the information about various class property values being dumped into log at the point of constraint failure.
3. Operational environment problems
These are set of mistakes in using the operational environment setup. These could be mistakes committed in the Makefiles used for building, compiling and simulating code, scripts for productivity, setting up libraries of reusable internal or third party vendors components, simulators and other tools etc.
GNU make issues can manifest as new code changes not reflecting in simulation. Leading to same error showing up again even after fix. Check the code picked up by the compile to see if the new changes are reflected. Linking issues can show up at times due to issues unknown. That’s why a good clean target is as important as build targets. This will ensure many unproductive issues are kept away. Makefile and rules organization can reach crazy levels complication. One simple point to keep in mind is inside all the make black magic, two important commands can guide debug. They are command for compile and command for simulation. Make utility provides special switches to gain additional insights. Make is a different world by itself.
Perl, Python or TCL scripts used for productivity can report incorrect data or do an incorrect generation. Always know a way to create results or generate them manually. Manual results can be used to match with the data reported or generated by scripts to gain insights for debug.
Rare but at times the simulator’s mistakes may also get discovered. Simulator behavior may not be in compliance with the LRM. These can be hard to debug and lengthy to resolve.