Cloud native EDA tools & pre-optimized hardware platforms
As Machine Learning and Artificial Intelligence is becoming pervasive, workloads are also increasing on virtual machines and components. Industry needs mechanisms that can prioritize workloads and guarantee performance. The Compute Express Link (CXL) is an open industry-standard interconnect between processors and devices such as accelerators, memory buffers, smart network interfaces, persistent memory, and solid-state drives. CXL offers coherency and memory semantics with bandwidth that scales with PCIe bandwidth while achieving significantly lower latency than PCIe.
As a general device interconnect for graphics processing units (GPUs), general purpose graphics processing unit (GP-GPUs), field programmable gate arrays (FPGAs), CXL uses the Peripheral Component Interconnect-Express® (PCI-Express® or PCIe®) serial interface. CXL also targets memory which is traditionally connected to the CPU through the Double Data Rate (DDR) parallel interface.
A new feature of the CXL protocol allows memory pooling enhancements and requires distributed memory management. It also raises the requirement of devices to be assembled dynamically during run time while attached to a virtual machine, which leads to significantly better resource usage and lower cost due to increased multiplexing opportunities.
These requirements create a need for CXL to be further enhanced and deployed to provide high reliability with low latency load store access, and more receptive quality of service enhancements.
The current CXL 3.0 specification features can be summarized below.
1) Link Speed up to 64 GT/s
2) CXL.IO, CXL.Cache, and CXL.Mem protocols support
3) 256B and 68B Flit support
4) Latency Optimized Flits
5) Back Invalidate Snoop
6) Fabric Support
7) IDE Security
While current versions focus on coherency and switching capabilities, version 3.1 addresses the bandwidth roadblocks of accelerators which must access peers coherently and more:
CXL 3.1 Specification Features:
- Direct P2P CXL.mem for accelerators
- Extended Metadata
- UIO direct peer-to-peer support in CXL fabric
- GFAM expansion
- Trusted Execution Environment Security Protocol
Peer-to-Peer Communications
With the Evolution of CXL, both memory devices and IO devices will evolve multi-host capabilities that allow dynamically assigning fractions of their capacity to individual hosts over CXL. Highly Composable designs require better resource usage due to increased multiplexing opportunities, and thus lower cost which facilitates accelerating distributed systems via shared memory, message passing, and peer-to-peer communication via CXL.
Devices pay a high penalty when accessing peer HDM-DB by going through the Host to access peer HDM and sacrifice bandwidth. If UIO (unordered IO) is used it sacrifices Coherency. CXL 3.1 introduces a new asymmetric channel to overcome this bandwidth loss which allows Type 1-2 accelerators to access peer memory coherently and with full CXL bandwidth.
Trusted Execution Environment Security Protocol
As the CXL ecosystem ramps up we need a mechanism to develop a rigorous approach for error containment and management of CXL’s expansion. We already mentioned that we are approaching composable systems where components can be attached to virtual machines anytime, so this raises concerns for the security of our machines or to say our Hardware.
Each device needs to perform functions and have encryption to exchange keys with your virtual machines in the data center. However, this process can be complex and problematic.
CXL 3.1 introduces a model that focuses on providing confidential computing support for direct attached CXL memory. Direct attached memory is defined as using the CXL protocol to communicate with a memory device, or target and the CXL RPs of the host, without intermediaries in the middle of the two.
What is Extended Metadata in CXL 3.1 Specification?
Metadata is additional information required and carried with each cache line transfer over the interconnect that is not considered data and is stored in the cache hierarchy and memory subsystem.
e.g. Memory tagging information can be carried as part of the cache lines.
Trailer bits up to 32 bits have been introduced in 256B flit mode to accommodate this EMD information.
What is UIO direct peer-to-peer support in CXL 3.1 Specification?
As systems are expanding, concepts like CXL Memory pooling will spark changes to both local and distributed memory management. This will require a system-level approach to mitigate congestion as well as failures in a distributed memory fabric. QoS in the CXL standard is currently limited to CXL.mem and does not address fabric congestion. CXL 3.1 introduces a mechanism where a UIO requester/device can access another target/device while congesting the system. CXL switches may allow routing of UIO accesses to HDM in the same VH as the UIO requester.
As more switch topologies are introduced, switch-driven features require a system-level configuration and setup. Some of the switch behaviors can be verified at transaction and data link layers to ensure transmission capabilities work. However, more complex switch behaviors require a multiple host and device environment.
Security features span from software, firmware, and hardware. Designing and verifying a specific layer’s implementation must be done with knowledge of how it fits into the overall security requirements.
With new features in the 3.1 specification, maintaining backward compatibility and functional correctness of previous versions of the specification becomes more challenging. Taking advantage of proven design IP and verification IP is more important than ever.
Synopsys has been actively developing and working with industry leaders to support features and use cases for the latest CXL 3.1 specification.
Synopsys offers verification IP (VIP), test suites and protocol solutions for CXL 3.1, providing a comprehensive set of protocol, methodology, verification, and productivity features, enabling users to achieve accelerated verification closure. Leveraging Synopsys’ broad interconnect portfolio, our partners can perform early verification of their designs.
Synopsys IPs are verified using independently developed Synopsys VIPs providing companies with an industry leading CXL solution. A complete out-of-the-box solution enables designers to focus on chip design features and differentiation to accelerate their time-to-market.
Synopsys VIP is natively integrated with the Synopsys Verdi® Protocol Analyzer debug solution as well as Synopsys Verdi® Performance Analyzer. Running system-level payload on SoCs requires a faster hardware-based pre-silicon solution. Synopsys transactors, memory models, hybrid and virtual solutions based on Synopsys IP enable various verification and validation use-cases on the industry’s fastest verification hardware, Synopsys ZeBu® emulation and Synopsys HAPS® prototyping systems.
More information on Synopsys CXL 3.1 VIP and Test Suites is available at http://synopsys.com/vip