Cloud native EDA tools & pre-optimized hardware platforms
Shift left is a commonly used industry phrase meaning ‘test early and test often’. The goal is to catch bugs as early in the development cycle as possible since earlier bug detection results in huge productivity gains. In the semiconductor industry, there are a couple of ways we can shift left:
Often the focus for verification teams is the first of these two because each time a bug makes it past a team (from verification to the lab to the customer), the cost associated with finding and fixing the bug rises exponentially. However, the second way is also important as it increases the productivity of both the design and verification teams.
In this post I’ll discuss ‘sanity testing’ - why verification teams should run sanity tests frequently and how doing so can help teams catch bugs earlier and improve productivity.
A sanity test is a simple test intended to exercise the main functionality of your design. It should have the following characteristics:
Sometimes it takes more than one type of test to get reasonable coverage on each check-in. For example, some teams like to include tests for a couple of functional modes and a register test, or perhaps tests from different levels of hierarchy (system level and block level). How many tests are run depends on the specifics of the design/testbench, but it is important that the sanity test or suite adheres to the requirements above.
So how often should a sanity test run? The answer is as often as reasonably possible. Some teams will set up a cron job to run periodically (every 1-4 hours is typical). My preferred approach is to launch a sanity with each submission. My team typically sets up sanity testing using an open-source tool called Jenkins. Whenever a new check-in happens, Jenkins is able to sync out a workspace and launch a test. It takes care of the workspace management and results areas of cleanup for you.
In terms of which submissions will result in a sanity test launch, the target should be anything that can change how the test is built or run. Primarily this will be changes to either your design or testbench files (recompile), but it can also include collateral used at runtime like memory initialization files, firmware code, test vectors, etc. Lastly, even a change to the scripts used to compile and/or run your tests should trigger a sanity test, as changes there can also affect the results of your tests.
Why is it so important to catch bugs quickly through sanity testing? The longer a bug sits undetected in your code base (design or testbench), the harder it is to narrow down exactly what caused the failure. Additionally, it is common for multiple people on the team to experience the same failure and they may try to debug/resolve the issue independently, causing duplicated effort. Lastly, if you don’t know which change caused the failure immediately you have fewer options on how to resolve it – for example if a failure is causing productivity issues for the whole team and you know exactly which check-in caused it, you may choose to roll back the check-in until the issue can be debugged.
Sanity testing is an important component to improve productivity for design and verification teams. It does so by running a small but representative sample of tests on each submission to catch issues as early as possible.