A functional prototype of the Accelerated Box of Flash. Hardware components are all standard for ease of adoption. Accelerators and storage devices are placed in the U.2 slots in the front bays while there is also an internal PCIe (peripheral component interconnect express) slot used for accelerator hardware. Photo Courtesy LANL
LANL NEWS RELEASE
Data is a vital part of solving complicated scientific questions, in endeavors ranging from genomics, to climatology, to the analysis of nuclear reactions. However, an abundance of data is often only as good as the ability to efficiently store, access and manipulate that data. To facilitate discovery with big data problems, Los Alamos researchers, in collaboration with industry partners, have developed an open storage system acceleration architecture for scientific data analysis, which can deliver 10 to 30 times the performance of current systems.
The architecture enables offloading of intensive functions to an accelerator-enabled, programmable and network-attached storage appliance called an Accelerated Box of Flash or simply ABOF. These systems are destined to be a key component of the Laboratory’s future high performance computing platforms.
“Scientific data and the data-driven scientific discovery techniques used to analyze that data are both growing rapidly,” said Dominic Manno, a researcher with the Lab’s High Performance Computing Division. “Performing the complex analysis to enable scientific discovery requires huge advances in the performance and efficiency of scientific data storage systems. The ABOF programmable appliance enables high-performance storage solutions to more easily leverage the rapid performance improvements of networks and storage devices, ultimately making more scientific discovery possible. Placing computation near storage minimizes data movement and improves the efficiency of both simulation and data-analysis pipelines.”
Speeds up scientific computation
Scalable computing systems are adopting data processing units (DPUs) placed directly on the data path to accelerate intensive functions between CPUs and storage devices; however, the ability to leverage DPUs within production-quality storage systems for use in complex HPC simulation and data-analysis systems has proven difficult. While DPUs have specialized computing capabilities that are tailored to data processing tasks, their integration into HPC systems has not fully realized available efficiencies.
The ABOF appliance is the product of hardware and storage system software co-design. It enables simpler use of NVIDIA BlueField-2 DPUs and other accelerators for offloading intensive operations from host CPUs without major storage system software modifications and allows users to leverage these offloads and the resulting speedups with no application changes.
The current ABOF implementation accelerates three critical functional areas necessary to storage system function — compression, erasure coding and checksums — by applying specialized accelerators. Each of these functions represents time, expense and energy-use in storage systems. It utilizes BlueField-2 DPUs with 200Gb/s InfiniBand networking. The performance-critical functions of the popular Linux Zettabyte File System (ZFS) are offloaded to the accelerators in the ABOF. This ZFS offload is accomplished by using a new ZFS Interface for Accelerators (available at the GitHub software platform). The Linux DPU Services Module, also on GitHub, is a Linux kernel module that enables the use of DPUs from directly within the kernel, irrespective of where they exist along the data path.
An internal view of the Accelerated Box of Flash showing the connectivity NVMe SSDs (front of chassis) to BlueField-2 DPUs. At top right is a PCIe location for an accelerator. This demonstration used an Eideticom NoLoad devices. Photo Courtesy LANL.
Released in January, successful demo
The project underwent a successful internal demonstration following the January release of the ABOF appliance hardware and its supporting software.
Collaborators included NVIDIA, which built the data processing units and provided a scalable storage fabric; Eideticom, which created the NoLoad computational storage stack used to accelerate data-intensive operations and minimize data movement; Aeon Computing, which designed and integrated each component into a storage enclosure; and SK hynix, which partnered on providing fast storage hardware.
“HPC is solving the world’s most complex problems, as we enter the era of exascale AI,” said Gilad Shainer, senior vice president of networking at NVIDIA. “NVIDIA’s accelerated computing platform dramatically boosts performance for innovative exploration by pioneers such as Los Alamos National Laboratory, allowing researchers to drastically speed up breakthroughs in scientific discoveries.”
“The Next Generation Open Storage Architecture enables a new level of performance and efficiency thanks to its hardware-software co-design, open standards and innovative use of technologies such as DPUs, NVMe and Computational Storage,” said Stephen Bates, chief technology officer at Eideticom. “Eideticom is proud to work with Los Alamos National Laboratory and the other partners to develop the computational storage stack used to showcase how this architecture can achieve these new levels of performance and efficiency. The efficient use of accelerators, coupled with innovative software and open standards, are key to the next generation of data centers.”
“Developing a cutting-edge storage product with an end user has been a very positive experience,” said Doug Johnson, co-founder of Aeon Computing. “Working together with the technology vendors and end user in collaboration allowed for rapid iteration and enhancement of a new type of storage product
that will serve the most important goal a product can have, acceleration of the end user’s workflow.”
“SK hynix joined this collaboration building ABOF because we understand the need for a new flash memory-based system that can accelerate data analysis,” said Jin Lim, vice president of the Solution Lab at SK hynix. “Building on this showcase technology, we are committed to work with the collaboration partners in further defining the new architecture of the computational storage device and requirements that are critical to its best use cases.”
Building on the file system acceleration project, researchers plan to next pursue integrating a set of common analysis functions in the system. That functionality would allow scientists to analyze the data using the existing programming, potentially warding off the need for additional data movement and supercomputing resources. This functionality would be specialized and tailored to the scientific community — another robust tool for tackling the complicated, data-intensive questions that underlie the challenges in our world.