The Department of Energy’s National Nuclear Security Administration (DOE/NNSA), Lawrence Livermore National Laboratory (LLNL), and Los Alamos National Laboratory (LANL) announced a strategic partnership agreement with SambaNova Systems to accelerate key artificial intelligence (AI) computing initiatives.
The cornerstone of this partnership agreement is the acquisition of multiple SambaNova DataScale systems, deployed at each of the aforementioned NNSA Laboratory facilities. SambaNova DataScale is a complete dataflow system featuring the SambaFlow software stack and the world’s first Reconfigurable Dataflow Unit (RDU), the SambaNova Systems Cardinal SN10 RDU. The RDU is a next generation computing processor designed from the ground up for efficiently running dataflow workloads such as AI. Each system is built with eight RDUs and is designed for efficient deep-learning inference and training calculations.
Each SambaNova Systems RDU can support multiple simultaneous jobs or work seamlessly with the other RDUs to enable large model execution, leveraging the SambaFlow software stack to efficiently map a model’s dataflow across the RDUs to minimize communication requirements.
“This strategic partnership agreement with SambaNova Systems will highlight extraordinary efforts that will contribute to the continuous advancing of AI and machine learning initiatives within DOE and the NNSA,” said David Etim, a federal program manager for the NNSA Office of Advanced Simulation and Computing and leader of the Advanced Machine Learning Initiative.
LLNL has coupled the SambaNova DataScale system into the Corona supercomputing system as a network-attached disaggregated accelerator, with plans for tighter integration later this year. Once fully integrated into the Corona system, LLNL plans to explore accelerating cognitive simulation applications that combine high performance computing (HPC) and AI.
“AI accelerators provide the basis for a heterogeneous system architecture that will support efficient cognitive simulation,” said Bronis de Supinski, CTO for Livermore Computing (LC). “LLNL’s LC is leading the integration of these subsystems into large-scale resources such as Corona. Our strategy is already demonstrating that this approach will provide more cost-efficient solutions for the workloads of the future.”
The first use case is an Inertial Confinement Fusion (ICF) problem where the hydrodynamics runs on the HPC system and material properties calculations are handled via inference calculations on the SambaNova DataScale system. SambaNova DataScale’s ability to run dozens of inference models at once is enabled by built-in multi-tenancy capabilities and is key to these workloads where multiple materials will need to be concurrently loaded.
“Integrating the SambaNova DataScale system into Corona allows us to explore Machine Learning (ML) training where ML models are trained on output from HPC simulations running on Corona,” said Principal HPC Strategist Ian Karlin, who leads the SambaNova project at LLNL. “Our COVID-19 research running on Corona today is an example of this workflow, with the trained ML model feeding suggestions on what simulations to run next back to the HPC system.”
LANL has integrated the SambaNova DataScale system into “Darwin,” a heterogeneous cluster used for investigating and porting applications to emerging technologies. The first application targeted for acceleration is modeling quantum chemistry with Density Functional Theory (DFT) level accuracy. LANL has developed a workflow for building ML models of interatomic energies and forces, enabling molecular dynamics simulations with high accuracy in a computationally efficient manner. These ML models are very faithful to DFT reference calculations and enable reactive chemistry from first principles in support of the Stockpile Stewardship Program.
These calculations currently run on GPU hardware and are showing further promise of acceleration with the SambaNova DataScale system. An ongoing collaboration between SambaNova Systems and LANL scientists suggests the possibility of up to 5x speedup compared to the existing GPU implementation. LANL will investigate other ML/AI applications in the coming year.
“SambaNova Systems is providing the platform for innovation to enable visionaries to achieve breakthrough advancements in their domains,” said Rodrigo Liang, Co-founder and CEO, SambaNova Systems. “The convergence of HPC and AI computing is just getting underway and we eagerly anticipate the scientific advancements that will be made by LLNL and LANL with SambaNova DataScale systems.”