Cloud native EDA tools & pre-optimized hardware platforms
Graham Wilson, Sr. Product Marketing Manager for ARC Processors, Synopsys
Put yourself in the shoes of a system architect needing to decide on what type of digital signal processor (DSP) is needed for the next system-on-chip (SoC) or the next family of SoCs. One approach is to use a big, high-performance DSP and run most of the signal processing algorithms in software on the DSP. The benefits of this are clear in that the SoC can be designed, taped out, and in production much more quickly, and with lower risk with the functionality defined in software, which can be created, debugged, and changed post silicon tape out.
However, this type of solution may not be feasible if a small area and low power consumption are key requirements for the SoC. Also, the DSP may not be able to handle the high computation and data throughput rates efficiently. A better option might be to use hardware or highly dedicated programmable engines for computation-intensive tasks. Offloading computation from a DSP to hardware blocks or specific computation engines results in a smaller DSP that can be used with small hardware blocks running at a lower clock frequency. This reduces the area and power but is more complex in terms of SoC design and implementation, extending tape out dates and increasing design risk. Figure 1 shows the tradeoffs depending on what type of processor is selected.
Figure 1. DSP selection
So back to that system architect needing to make a decision. Sometimes the requirements are so well defined that the decision is easy. However, the majority of the time they are not, and it’s like trying to hit a moving target. Also, the decision about what type of DSP to use is typically not for just one SoC, but for a family of SoCs covering a range of application types, from those targeting low cost devices to devices with high performance requirements.
There is also another dimension to consider in the selection of a DSP, the level of neural network (NN) computation for Artificial Intelligence (AI) algorithms. Next-generation algorithms will most probably use AI algorithms as part data computation, hence the processor selection will need to take into account the level of AI computation required, whether it can run in software on the DSP or requires hardware acceleration of the NN computation.
So, system architects may be thinking they are looking for a single DSP, but ultimately, they are looking for a family of processors that will meet their needs across the range of SoCs. While many SoCs have been implemented with a large vector DSP that met the initial requirements, as the SoC designs evolve, the majority of DSP functionality is offloaded to hardware blocks, leaving a large vector DSP in the SoC running non-vector control code, that could have been done by a much smaller scalar core. Just the software code legacy and migration of a design means an oversized vector DSP is still being used.
What is needed for the system architect to be able to make a decision, is a family of processors that is broad and unified. Broad in terms of offering small scalar/DSP cores when a lot of computation offload to hardware blocks is used, through medium sized DSP, and up to large DSPs with the option of a neural network computation. This gives the ability to follow the initial requirements and then evolve with the changing SoC design. Unified in terms of a software code base, portable from one processor to another. Unified also covers a single development platform or tool kit.
Synopsys DesignWare® ARC® Processor IP portfolio is an industry leader in terms of the breadth and single software development platform it offers SoC system architects. The ARC processor portfolio has the right processors to meet all the computation needs of the SoC system architecture requirements. Figure 2 shows the processors available in the ARC processor portfolio.
Figure 2: ARC Processor Portfolio
The ARC DSP processors are unique in their software compatibility across all their processors. All of the ARC DSP processors support the single ARCv2DSP ISA, which enables optimized code to be ported from one core family to another. For example, an SoC using a DSP-enhanced HS processor (HS4xD), could be moved to a DSP-enhanced EM processor if a much smaller, lower power SoC is needed. At the other end, it is possible to move up to the ARC VPX vector DSP from the HS processor, as the scalar unit in the VPX DSP is based on the HS processor, making it possible for the code base to run on the VPX scalar unit. The previously described situation of a large vector DSP becoming used as just non-vector, control code when the main computation has been offloaded to hardware, is a perfect example of moving code to a different processor, for example, moving from the ARC VPX DSP to the ARC HS processor. This allows the system architect to pick a processor that meets the performance, power, and area requirements of the design without adding extra risk and software re-development overhead.
Let’s say that the SoC system architect needs to consider using a vector DSP processor to run the majority of vector computation on the DSP. The ARC VPX DSP processor is an ideal place to start. As all ARC processors are configurable and extendable, the architect can configure the ARC VPX processor to meet their computation requirements as minimum size and power. For example, the VPX can be configured to be a vector DSP, computing integer and fixed-point vector data with its dual vector units and three ALU engines, supporting a wide range wide of data types for NN algorithms, as well as tradition communications and audio/voice applications. On the flip side, VPX can be configured as an ultra-high performance vector floating point DSP processor with three parallel executing vector floating point/math units. Or it can be configured to enable all computation units for flexibility in integer, fixed, and floating point vector computation.
Figure 3. Migration path of ARC processors
Figure 3 shows the migration or evolution paths available with the ARC processors. We have discussed the movement between ARC EM, ARC HS, and ARC VPX DSP processors for picking the right point for signal processing algorithm computation. At the beginning of this article, we briefly mentioned the other dimension, NN computation. More and more computation algorithms are using NN / AI computation. The EM, HS, and VPX DSP processors support the embARC Machine Learning Inference (MLI) inference software library. This MLI library offers low level NN computation components optimized on Synopsys ARC DSP processors. TensorFlow Lite Micro framework is supported through the MLI library components, hence offers highly optimized performance for a wide range of graph porting. This allows Synopsys ARC DSP processors to execute NN algorithms and graph computation in software on the DSP.
The VPX DSP processor, even with its dual vector units, is not able to deliver the required performance for NN algorithm computation requirements in this case hardware acceleration with a NN hardware engine is needed. To address this, Synopsys offers the ARC EV7x processor, which provides a variety of NN hardware engine configurations from 440MACs to 14KMACs. A key point in the ability to evolve to the EV7x processor is that the vector DSP within the ARC EV processor is based on ARC VPX DSP processor. Hence the system architect using an ARC VPX processor can move to the EV processor very simply and benefit from the NN hardware engines for high-end AI algorithms.
There are many applications that would benefit from this option of moving from an ARC VPX to an ARC EV processor. For example, in RADAR and LiDAR sensor signal processing applications the detection of objects can be implemented on the NN hardware engines. In 5G communications, advanced MIMO algorithms will need NN level computation requiring hardware acceleration. In next-generation voice/speech recognition applications, speech recognition can be implemented on an ARC VPX DSP processor, with natural language processing running on the ARC EV processor.
All ARC processors are supported by the DesignWare ARC MetaWare Development Toolkit, a complete solution for developing, debugging, and optimizing embedded software targeted for ARC processors. For efficient algorithm development, the latest MetaWare tools include an enhanced C/C++ compiler, as well as support for auto-vectorization. The toolkit also includes a DSP software library of fixed-point math functions and an instruction-accurate simulator that includes accurate modeling of the new DSP operations.
In summary, an SoC system architect has a very challenging task in selecting the right level of programmability with a DSP processor. The trade-offs in terms of performance and minimal power consumption and area, as well as future and current SoC family changes, make it difficult to pick the right DSP processor for all requirements. In short, a system architect needs a broad DSP portfolio to fall upon when computation requirements change and evolve from SoC to SoC. Synopsys’ ARC Processor IP portfolio offers a breadth of processors with full software compatibility, giving system architects the ability to select the best processor solution for their SoC design.