Cloud native EDA tools & pre-optimized hardware platforms
Manuel Mota, Sr. Product Marketing Manager
The semiconductor industry is undertaking a major strategy shift towards multi-die SoCs with far reaching implications for the way SoCs are architected and designed.
This shift is fueled by several converging trends:
However, the novelty of multi-die technology and lack of a design ecosystem, has made SoC architects to pause and even postpone their multi-die SoC projects. The industry has come together to provide comprehensive, integrated multi-die design and verification products and a complete set of advanced packaging options.
Early adopters started developing their own specialized die-to-die interfaces, however, the industry quickly realized that such an approach was inhibiting the ability to assemble dies developed by different vendors. The industry needed standardized die-to-die interconnects. Several industry alliances have come together to define such standards, as shown in Figure 1.
Figure 1: Several organizations have defined and developed standards for die-to-die interconnects
This article takes a closer look at the UCIe specification and its main advantages.
UCIe, a recently announced specification, inherits a substantial amount of work and experience in several key technologies amassed by the original promoters, shown in Figure 2. UCIe is a comprehensive specification that can be used immediately as the basis for new designs, while creating a solid foundation for future specification evolution.
Figure 2: Companies join forces to establish a complete standardized die-to-die interconnect
Contrary to other specifications, UCIe defines a complete stack for die-to-die interconnect, ensuring interoperability of compliant devices which is a mandatory requirement for enabling the multi-die system market.
From the start, UCIe incorporates features that support multiple current and trending use cases. UCIe supports the present required data rate from 8Gbps/pin to 16Gbps/pin. UCIe is also expected to support flexible data rates up to 32Gbps/pin, which will be a requirement in future high-bandwidth networking and data center applications.
UCIe supports all types of package technologies in two ways:
Both options share the same architecture and protocols. The only difference is in bump map and PHY organizations. This difference means that system architecture, system validation, and software development can be re-used regardless of the chosen package type for a particular SoC.
UCIe supports novel resource aggregation (or pooling) architectures in data centers, either within the blade with flexible PCIe/CXL IO dies or rack-to-rack with UCIe-enabled optical IO dies.
Most importantly, UCIe supports compute scaling by leveraging streaming (user-defined) protocols to create low-latency connections between the Network-on-a-Chips (NoCs) of multiple server (or AI) dies in the same package.
As shown in Figure 3, the UCIe specification is divided into three stack layers: Physical Layer, Die-to-Die Adapter Layer and Protocol Layer.
Figure 3: The UCIe specification layering
The UCIe interface uses clock forwarding and single-ended, low voltage, DDR signaling to improve power efficiency. Power supply disturbances can be reduced by scrambling the data at the PHY level. Contrary to other techniques (like DBI), data scrambling does not impact bandwidth efficiency.
The receiver data recovery is greatly simplified due to clock forwarding in parallel with the data, leading to additional power and latency savings. Figure 4 shows a block diagram of the UCIe PHY architecture.
Figure 4: block diagram of the UCIe PHY architecture
UCIe defines a module as the smallest interface unit. Each module includes a mainband “bus” up to 64 transmit and receive IOs for advanced package (or 16 for standard package), clock forward IOs, a valid (framing ), and track IOs. A sideband “bus” is also implemented as shown in figure 5.
Figure 5: The UCIe module implements a main band and a sideband bus
To reduce yield loss due to ubump quality in advanced package assembly, UCIe offers a test and repair mechanism that relies on 6 redundant pins (for TX and RX data, clock, valid and track) and 2 redundant pins (for sideband TX and RX).
UCIe doesn’t implement pin redundancy for standard package since the C4 (or CuPillar) bump yield and complete assembly process yield are very high. For these packages, UCIe supports a “degraded” operating mode where only half of the module is active in case a failure is detected on the other half.
The test and repair process is implemented at link initialization. The PHY tests each die connection to determine if there is any failure. In case of failure, the corresponding signal is re-routed to a redundant pin as shown in Figure 6.
Figure 6: The Physical Layer tests each die connection to determine failure and re-routes signals to a redundant pin
Table 1 shows the main differences between the UCIe specification for advanced packaging and standard packaging.
UCIe PHY Variant |
Advanced Package |
Standard Package |
Data Rate |
16Gbps |
16Gbps |
TX/RX pins per module |
64 |
16 |
Redundancy (T&R) |
Yes |
No |
Total BW (Bi-dir) |
4Tbps |
1Tbps |
BW Efficiency (Aggr.) |
5.2Tbps/mm |
0.9Tbps/mm |
Energy Efficiency |
0.3pJ/bit |
0.5pJ/bit |
Latency TX+RX |
12 UI |
12 UI |
BER |
< 1e-15 |
< 1e-27 |
Bump Pitch |
45um |
110um |
Channel Reach |
> 2mm |
> 10mm |
Low Power Modes |
Idle mode |
Idle mode |
Termination RX |
No |
50 ohm |
Signal Swing |
0.4V |
0.4V |
Table 1: Different UCIe PHY features for advanced packaging versus standard packaging
The differences are only noticeable at the electrical level and do not impact the upper protocol layers, as previously discussed. The differences derive from the significantly coarser minimum bump pitch required for standard packages (110u) versus advanced packages (45u) and from the need to support longer channel reaches in standard packages for added flexibility.
The Die-to-Die Adapter Layer is an intermediate layer that interfaces any protocol to the UCIe PHY Layer. The Die-to-Die Adapter layer manages the link itself. At link initialization, it waits for the PHY to complete the link initialization, including calibration, test and repair, at which time it initiates the discovery of both die capabilities. It will agree on which protocol will be used (in case several protocols are implemented) to hand over to the protocol layer for the mission mode activity.
The interface between the Die-to-Die Adapter Layer and Protocol Layer, called FLIT-aware Die-to-Die Interface (FDI) is a FLIT-based interface. To adapt to different protocols, it supports various FLIT modes:
UCIe also defines raw modes for CXL and PCI Express protocols. These modes are intended for retimer applications when UCIe traffic runs across an optical link. When in retimer mode, latency and error rates are not defined by the UCIe link itself and it is assumed that the Protocol Layer will take care of all the error correction mechanisms, including CRC, retry and possibly FEC. The Die-to-Die Adapter layer does not add CRC codes into the protocol FLIT and does not check for errors or applies the retry mechanism on the receiver.
UCIe maps common protocols, like PCI Express and CXL, enabling developers to leverage previous work on software stacks and simplify the adoption of in-package integration using multi-die architectures. UCIe expects standardization of other protocol mappings in its future releases.
UCIe also enables the mapping of other protocols via the streaming mode. For example, low-latency connections between NoC fabrics on two compute dies can be supported with CXS or AXI bridges to the FDI interface in streaming mode. Other user-defined protocols can be implemented in the same way, taking advantage of the Physical Layer and Die-to-Die Adapter Layer link management features.
When implementing a UCIe interconnect, the architect may choose to support one or more of these protocols. Implementing multiple protocols enhances the applicability of the die in different use cases, a real advantage in the context of an open multi-die system marketplace. The Die-to-Die Adapter Layer is responsible for the discovery and selection of which protocol to use in a given interconnect.
The UCIe specification brings together very competitive performance advantages to multi-die SoC designers, including high energy-efficiency (pJ/b), high edge usage efficiency (Tbps/mm) and low latency (ns), support for the most popular IO protocols as well as any user-defined protocols, compatibility with all types of package technologies from organic substrates to advanced silicon interposers, and covering all the critical aspects of the interface (initialization, sideband, protocol, test and repair, error correction, etc.).
The advantages of UCIe makes it a very compelling technology poised to ease the path toward a truly open multi-die system ecosystem by ensuring interoperability.
The UCIe promoters outlined a compelling roadmap to support the industry’s new use cases and requirements. The promoters expect UCIe to support higher data rates and new protocols, 3D packaging, and other aspects of multi-die system design such as form factors, security, testability, etc.
Synopsys delivers a comprehensive multi-die system solution to make it easy for designers transitioning to multi-die SoC architectures.