CXL Gathers Momentum at FMS 2024

The CXL consortium has had a daily presence at FMS (which rechristened itself from ‘Flash Reminiscence Summit’ to the ‘Way forward for Reminiscence and Storage’ this 12 months). Again at FMS 2022, the corporate had announced v3.0 of the CXL specs. This was adopted by CXL 3.1’s introduction at Supercomputing 2023. Having began off as a number to system interconnect customary, it had slowly subsumed other competing standards reminiscent of OpenCAPI and Gen-Z. In consequence, the specs began to embody all kinds of use-cases by constructing a protocol on prime of the the ever-present PCIe enlargement bus. The CXL consortium includes of heavyweights reminiscent of AMD and Intel, in addition to a lot of startup corporations trying to play in numerous segments on the system facet. At FMS 2024, CXL had a first-rate place within the sales space demos of many distributors.

The migration of server platforms from DDR4 to DDR5, together with the rise of workloads demanding massive RAM capability (however not notably delicate to both reminiscence bandwidth or latency), has opened up reminiscence enlargement modules as one of many first set of extensively obtainable CXL gadgets. During the last couple of years, now we have had product bulletins from Samsung and Micron on this space.

SK hynix CMM-DDR5 CXL Reminiscence Module and HMSDK

At FMS 2024, SK hynix was displaying off their DDR5-based CMM-DDR5 CXL reminiscence module with a 128 GB capability. The corporate was additionally detailing their related Heterogeneous Reminiscence Software program Growth Package (HMSDK) – a set of libraries and instruments at each the kernel and person ranges geared toward growing the convenience of use of CXL reminiscence. That is achieved partly by contemplating the reminiscence pyramid / hierarchy and relocating the info between the server’s most important reminiscence (DRAM) and the CXL system based mostly on utilization frequency.

The CMM-DDR5 CXL reminiscence module comes within the SDFF form-factor (E3.S 2T) with a PCIe 3.0 x8 host interface. The inner reminiscence is predicated on 1α expertise DRAM, and the system guarantees DDR5-class bandwidth and latency inside a single NUMA hop. As these reminiscence modules are meant for use in datacenters and enterprises, the firmware consists of options for RAS (reliability, availability, and serviceability) together with safe boot and different administration options.

SK hynix was additionally demonstrating Niagara 2.0 – a {hardware} answer (at present based mostly on FPGAs) to allow reminiscence pooling and sharing – i.e, connecting a number of CXL recollections to permit completely different hosts (CPUs and GPUs) to optimally share their capability. The earlier model solely allowed capability sharing, however the newest model allows sharing of information additionally. SK hynix had presented these options on the CXL DevCon 2024 earlier this 12 months, however some progress appears to have been made in finalizing the specs of the CMM-DDR5 at FMS 2024.

Microchip and Micron Exhibit CZ120 CXL Reminiscence Enlargement Module

Micron had unveiled the CZ120 CXL Reminiscence Enlargement Module final 12 months based mostly on the Microchip SMC 2000 collection CXL reminiscence controller. At FMS 2024, Micron and Microchip had an indication of the module on a Granite Rapids server.

Extra insights into the SMC 2000 controller had been additionally offered.

The CXL reminiscence controller additionally incorporates DRAM die failure dealing with, and Microchip additionally gives diagnostics and debug instruments to investigate failed modules. The reminiscence controller additionally helps ECC, which types a part of the enterprise class RAS characteristic set of the SMC 2000 collection. Its flexibility ensures that SMC 2000-based CXL reminiscence modules utilizing DDR4 can complement the primary DDR5 DRAM in servers that assist solely the latter.

Marvell Publicizes Structera CXL Product Line

Just a few days previous to the beginning of FMS 2024, Marvell had announced a brand new CXL product line beneath the Structera tag. At FMS 2024, we had an opportunity to debate this new line with Marvell and collect some extra insights.

In contrast to different CXL system options specializing in reminiscence pooling and enlargement, the Structera product line additionally incorporates a compute accelerator half along with a memory-expansion controller. All of those are constructed on TSMC’s 5nm expertise.

The compute accelerator half, the Structera A 2504 (A for Accelerator) is a PCIe 5.0 x16 CXL 2.0 system with 16 built-in Arm Neoverse V2 (Demeter) cores at 3.2 GHz. It incorporates 4 DDR5-6400 channels with assist for as much as two DIMMs per channel together with in-line compression and decompression. The combination of highly effective server-class ARM CPU cores signifies that the CXL reminiscence enlargement half scales the reminiscence bandwidth obtainable per core, whereas additionally scaling the compute capabilities.

Functions reminiscent of Deep-Studying Suggestion Fashions (DLRM) can profit from the compute functionality obtainable within the CXL system. The scaling within the bandwidth availability can be accompanied by decreased power consumption for the workload. The strategy additionally contributed in direction of disaggregation inside the server for a greater thermal design as a complete.

The Structera X 2404 (X for eXpander) might be obtainable both as a PCIe 5.0 (single x16 or two x8) system with 4 DDR4-3200 channels (as much as 3 DIMMs per channel). Options reminiscent of in-line (de)compression, encryption / decryption, and safe boot with {hardware} assist are current within the Structera X 2404 as effectively. In comparison with the 100 W TDP of the Structera X 2404, Marvell expects this half to eat round 30 W. The first objective of this half is to allow hyperscalers to recycle DDR4 DIMMs (as much as 6 TB per expander) whereas growing server reminiscence capability.

Marvell additionally has a Structera X 2504 half that helps 4 DDR5-6400 channels (with two DIMMs per channel for as much as 4 TB per expander). Different elements stay the identical as that of the DDR4-recycling half.

The corporate pressured upon some distinctive elements of the Structera product line – the inline compression optimizes obtainable DRAM capability, and the three DIMMs per channel assist for the DDR4 expander maximizes the quantity of DRAM per expander (in comparison with competing options). The 5nm course of lowers the ability consumption, and the components assist accesses from a number of hosts. The combination of Arm Neoverse V2 cores seems to be a primary for a CXL accelerator, and allows delegation of compute duties to enhance general efficiency of the system.

Whereas Marvell introduced specs for the Structera components, it does seem that sampling is not less than a number of quarters away. One of many fascinating elements about Marvell’s roadmaps / bulletins in recent times has been their give attention to creating merchandise tuned to the calls for of high-volume clients. The Structera product line is not any completely different – hyperscalers are hungry to recycle their DDR4 reminiscence modules and apparently cannot wait to get their palms on the expander components.

CXL is simply beginning its sluggish ramp-up, and the hockey stick phase of the expansion curve is unquestionably undoubtedly not within the close to time period. Nonetheless, as extra host techniques with CXL assist begin to get deployed, merchandise just like the Structera accelerator line begin to make sense from a server effectivity viewpoint.