Accelerating System Debug in the SoC Verification Flow

Swami Venkat, Taruna Reddy

Jan 17, 2022 / 4 min read

What goes through your mind when you think about the debug step in silicon chip verification? Do you start to cringe or feel a sense of dread as you anticipate countless, painstaking hours ahead? According to the latest 2021 Synopsys Global User Survey, system-level debug is still one of the top three verification challenges customers face. However, as time-consuming as the effort is, the earlier you can find and fix bugs in your design, the less costly it will be for your overall budget.

The variances involved in the debug process make the process painstaking and the results often unpredictable. First, you run stimulus against your RTL- or gate-level model to verify whether the model behaves as expected. If it doesn't, then the incident is tagged as an error. If you determine that the error is in the testbench, you might have to rewrite and rerun the testbench. If the error is in the design, sometimes it's superficial and you only need to run a few cycles of simulation. Often, you must run simulation for millions of cycles to find very deep corner case bugs that are embedded in the design. So, based on where the bug is and the number of lines of code involved, the time that's required to reach an answer will vary considerably.

Enhancing the debug process in terms of accuracy, speed, and exhaustive coverage is one of the most important verification challenges to solve. In this blog post, we'll discuss what makes debug so difficult, what to look for in a debug tool, and how advanced debug technologies can help ease the process.

Chip Debug Abstract Close-Up

What Drives Chip Debug Complexity?

Driven by burgeoning application areas like artificial intelligence (AI), machine learning (ML), and high-performance computing (HPC), chips are growing larger and more complex, making the debug effort that much more complicated. Consider the semiconductor landscape, where we've got:

  • SoCs consisting of hundreds of IP blocks and subsystems, some developed in-house and others purchased off the shelf; this includes complex protocols and memory
  • More complex interactions between different blocks and power domains and between hardware and embedded software
  • A variety of new domain-specific architectures

Chip size is one factor, but even more impactful on the debug process is chip functionality. For example, a relatively small design might require thousands of different transactions happening in parallel to simulate a specific condition. To isolate the bug, you'd need to examine parallel events to determine which branch is the culprit. Today, it really is as much about the complexity of the bug that was exposed as it is about the design size.

The end application for the chip is also a factor; for instance, the signals of interest in a chip for a low-power design such as a mobile phone will be vastly different from an embedded processor design. For an embedded processor design, you might want to debug C/assembly code alongside RTL. In a low-power design, when one part of the gate in the design is shut off, you'll insert isolation cells or other power retention techniques that aren't used in a processor design. Your debugging tool should be smart enough to understand the function of these gates or transistors to perform the correct assessment of any bottlenecks. What is needed is a simultaneous view of both hardware and software to help designers debug efficiently. For applications like automotive, where functional safety compliance is critical, exhaustive debug takes on an even greater importance.

How AI Helps Find Root Causes of Chip Design Failures

Being able to hone in on an actual culprit is very difficult because so many signals must be analyzed and understood. Without a debugging tool with the right capabilities, you might find yourself running thousands and thousands of simulations and tests to verify your design. Each one of these creates a lot of data that needs to be analyzed to zero in on the bug. This, in turn, highlights a productivity challenge?in the scheme of your design project, you'll want to minimize the time it takes to find the root cause of a failure.

AI and ML have proven to be a productivity advantage for a wide range of application areas, providing the intelligence and speed to find insights from voluminous amounts of data. Why not in debug? Imagine if, instead of debugging every failure only to find similarities across many of them, an AI-enhanced debugger could segment bugs based on their signatures and produce a more manageable number to evaluate for root causes. This is now possible with the new ML-based Verdi® Regression Debug Automation technology.

Also useful in debugging tools is the intelligence to enable waveforms and testbench setup to be reused in different contexts, accelerating time-to-debug for previously manual and tedious workflows.

Comprehensive and Unified Debug Saves Time and Effort

Synopsys provides comprehensive debug for all design and verification flows via our Verdi Automated Debug System, the centerpiece of the Verdi SoC Debug Platform. The system has the capacity for 2-billion-gate designs and is regularly enhanced with new features. It reduces debug time by, typically, more than 50% compared to competitive solutions, so you can focus on more value-added tasks. With the efficiency provided by the Verdi system, you can find and fix bugs across all domains and abstraction levels. The Verdi Automated Debug System is part of the Synopsys Verification Continuum®, a portfolio of solutions designed to help you find SoC bugs earlier and faster, bring-up software earlier, and validate your entire system.

Learn more about the Verdi SoC Debug Platform from our series of short video tutorials.

Summary

As critical as thorough debugging is to improving verification turnaround time (TAT), you also don't want to get too bogged down by the time and effort that understanding cause and effect as well as finding and fixing bugs can require. To be sure, encountering a design failure or having to do a design respin are worse propositions, but you also have time-to-market pressures to contend with. By automating labor-intensive tasks, advanced debugging technologies can bring the comprehensive coverage, reduced debug time, and ease of use that today's complex SoCs' and stretched verification teams' demand.

Continue Reading