Cloud native EDA tools & pre-optimized hardware platforms
In their recent article “The Bugs of Power,” industry thought leaders Joe Convey and Bryan Dickman are back again, this time exploring the concept of “power bugs” and trying to understand how to find them using emulation power analysis. Power consumption is one of the most critical and demanding aspects of modern multi-million-gate ASIC/SoC designs, and there are plenty of good reasons why this is so. As Joe and Bryan put it, “Power matters!”
Battery-powered mobile devices, primarily cell phones, drove the initial requirements for low-power design. But nowadays we are concerned about power consumption in almost all technology applications, especially automotive (and especially electric vehicles, or EVs), GPUs, AI acceleration, and also in high-performance computing (HPC), with data centers now accounting for more than 1% of global power consumption and trending upwards.
There is much good commentary available on this fascinating subject, but Joe and Bryan write about this subject through the lens of hardware design and verification engineers. They extend the usual definition of “bugs” to include any aspect of the design that is in error. Not meeting the power specification and power budget is one such class of error. The root cause of these errors is generally determined to be poor power architecture design, or poor implementation of the power intent, and can be caused by both hardware and software problems. For the hardware, the consequences are similar to regular functional hardware bugs; the product might be unfit for purpose, and significant (or even catastrophic) rework costs can ensue. Surveys show that more than 80% of designs are now actively managing power, and there is a trend of increasing complexity and sophistication in the way that this is achieved; e.g. workload balancing with multi-core systems, power domain shut-down, dynamic voltage and frequency scaling (DVFS), complex clock gating, etc.
Power bugs can be caught early in the development lifecycle, starting with either static spreadsheet analysis or dynamic virtual prototype-based analysis. Synopsys Platform Architect™ Ultra enables early analysis and optimization of multicore SoC architectures for performance and power. The solution also provides power trend insights for better decision-making on the right architecture and IP for a given design. Both RTL and UPF power intent bugs can be caught with power-aware simulation such as Synopsys VCS NLP, or with static analysis tools like Synopsys VC LP or Synopsys SpyGlass Power, which provides RTL power estimation. However, there is a category of power bugs that you will not observe until you are running the full hardware/software stack at the system level. Traditionally, power analysis with realistic software workloads is performed post-silicon, introducing a high risk of missing critical high-power situations, which exposes companies to significant cost and product adoption risk. The best opportunity to intercept these bugs pre-silicon is with emulator-based power analysis workflows such as Synopsys ZeBu® Empower emulation system for hardware/software power verification.
So why do we need to run so much software? Think about the complexity of your system. In hardware you may have multi-core CPUs and GPUs, AI accelerators, advanced IO systems, complex memory systems, power controllers, and custom IP blocks. Each will be performing a critical function, delivering a capability, and consuming power. Each will have software to drive it. Each will implement some form of potentially sophisticated power management. So, you need an analysis capability that supports the full system operating with all components and sub-systems in concert in order to understand the full-chip power profile. Booting the OS alone can consume billions of cycles. Think about running the operating system, the system software, application software, and the power controller software. How accurately can you predict what the average power and the peak power will be at any particular point in time under the conditions of a full software payload? Power analysis is necessary to identify those workloads where power consumption exceeds power budget, with the ability to debug these scenarios and determine the root cause of the problems. Joe and Bryan illustrate the problem in the following sketch.
Turnaround time for an emulation power analysis flow is what really matters. This means the total time taken to execute the software workload, extract the power data, perform the data analysis, and be ready to go again, even with multiple iterations in one day. You have to look at the whole flow here, and power analysis at this scale is a data processing-intensive activity. This is where having the right tools makes all the difference.
To start with, ZeBu Empower is the industry’s fastest emulation system, which means you are able to run more of the software than with other solutions.
At the same time, ZeBu Empower delivers best-in-class power analysis capabilities that enable developers to “see” the full power activity profile across billions of cycles and accurately identify the power windows of interest. This is achieved at full speed. Secondly, power windows can be explored at a finer grain with higher fidelity, zooming into millions of cycles to explore power cycles and measure average power and peak power to within 95% accuracy. Finally, zooming in further using the power cycles, to extract smaller power windows of interest, full RTL power analysis, and gate-level power sign-off, is possible in concert with the industry-leading power sign-off solution, Synopsys PrimePower.
As for power data-processing performance, ZeBu Empower works with scalable server configurations to deliver the fastest data processing solution and highest performance power verification engines that enable multiple iterations per day with actionable power profiling in the context of the full design and its software workload. With ZeBu Empower, software and hardware designers can utilize the power profiles to identify substantial power improvement opportunities for dynamic and leakage power much earlier.
As Joe and Bryan put it, “Verification is a resource-limited quest to find as many bugs as possible before tapeout.” Bugs in the power management functionality of a device can be catastrophic. Missing your power targets can be equally catastrophic if this is only realized when you get the silicon back. At the same time, you don’t want to wait until you have the silicon back to validate and debug your power management software. You are going to need pre-silicon methodologies and tools that enable accurate power analysis to be performed by running realistic software for the system. Power analysis using ZeBu Empower is the best and the fastest way to achieve this at system-level scale.