As the volume of data generated by large-scale experiments continues to grow, the need for rapid data analysis capabilities is becoming increasingly critical to new discoveries.
Argonne’s Advanced Photon Source Upgrade project (APS-U) is increasing the brightness of APS x-rays by as much as 500 times. With a corresponding increase in the amounts of experimental data generated, high-performance computing (HPC) resources are required to quickly process and analyze the results.
Numerous ALCF activities and achievements have helped realize the DOE effort to build an Integrated Research Structure (IRI) that seamlessly connects experimental facilities with its world-class supercomputing resources, including:
- Developing and testing methods to closely integrate supercomputers and experiments for near-real-time data analysis.
- Partnering with Pathfinder projects to advance plasma physics and fusion energy research.
- Participating in leadership groups and technical subcommittees dedicated to the design and implementation of computing facility functionality useful for experimentalists.
Nexus
Under Argonne’s Nexus effort, Argonne researchers are working to advance the DOE’s vision by integrating experimental facilities with ALCF computing resources.
As IRI aims to deliver DOE-enterprise-wide infrastructure for computing, ALCF has continued its commitment to linking experimental facilities with ALCF computing. Work with the APS over recent years has been a primary driver for defining new functionality and services ALCF has deployed to satisfy experiment-time computing needs at APS beamlines. Service accounts enable APS users to leverage automated analysis of their data at ALCF in a shared environment throughout their multi-day beamline campaigns, with jobs running immediately at experiment time in the on-demand queue on Polaris. Analysis results are available to scientists at the beamline via Globus Sharing enabled on the Eagle filesystem, at the time of experiment and post-hoc. Building on these ALCF-deployed features, Globus Compute and Globus Flows manage application execution and data transfer in a frictionless manner, for projects across the DOE-SC program offices.
Facility Integration
For over a decade, the ALCF and the APS have been collaborating to build the infrastructure for integrated ALCF-APS research, including the development of workflow management tools and enabling secure access to on-demand computing.
With the upgraded APS providing x-rays up to 500 times brighter than before, the APS-ALCF collaboration is providing increased computational power at experiment time. More than 20 beamlines housed at the APS identified significant computing needs and have engaged the full power of ALCF’s Nexus services and functionality, using service accounts for transparent access to the ALCF, and the demand queue for time-sensitive analysis of beamline data through integration with the APS Data Management System. With more beamlines coming online with ever greater computational needs, the APS demand for ALCF supercomputing resources and newly upgraded inter-facility network connectivity will continue to grow.
Expanding and Demonstrating Capabilities
In a recent achievement of facility integration for near-real-time data analysis, Argonne deployed a fully automated pipeline that uses ALCF resources to rapidly process data obtained from the x-ray experiments at the APS.
To demonstrate the capabilities of the pipeline, Argonne researchers carried out a study focused on a technique called Laue microdiffraction, which is employed at the APS and other light sources to analyze materials with crystalline structures. The team used the ALCF’s Polaris supercomputer to reconstruct data obtained from an APS experiment, returning reconstructed scans to the APS within 15 minutes of them being sent to the ALCF. The beamline technique introduced in the study allows users to collect data about 10 times faster than was previously possible.
These results carry implications for future software development, engineering, and beamline science.
Argonne researchers showcased the use of the Polaris system for processing data from APS experiments in near-real time during a demonstration at the SC24 conference.
Additional experiments and papers presented at the SC24 XLOOP workshop explored multiple IRI-related issues, including the scaling capabilities of file-based reconstruction of ptychography data—which requires particularly short data-processing turnaround times. New scans on the APS beamline storage system were automatically transferred to ALCF’s Eagle file system through Globus using a file-based workflow, which automatically launched reconstruction jobs on Polaris compute nodes using the on-demand queue. Once the reconstruction results were available on Eagle, they were transferred back to the APS through the same Globus transfer workflow.
In a related example, working with a team at the Lawrence Berkeley National Laboratory Advanced Light Source, ALCF staff have helped to automate analysis of data from a tomography beamline on Polaris. Using a service account to submit jobs to Polaris through Globus Compute and the demand queue to analyze data at experiment time, the team has moved beyond an initial prototype and is now able to run analysis in a dedicated discretionary allocation. This production-ready capability is planned to be used in upcoming beamline experiments.
Partnering to Advance Energy Technologies
The Plasma Physics and Fusion Energy Pathfinder aims to incorporate remote use of high-performance computing into experiments running at DOE’s DIII-D National Fusion Facility in San Diego, California.
Each DIII-D experiment runs on a 20-minute cycle that requires time-sensitive analysis of the data generated to inform and prepare the next experiment. Working closely with the DIII-D team and NERSC researchers, ALCF staff have collaborated with teams from DIII-D and the National Energy Research Scientific Computing Center (NERSC) to improve and automate the Consistent Automatic Kinetic Equilibrium (CAKE) workflow, developed and implemented at DIII-D to produce low-error, kinetically constrained magnetic equilibrium reconstructions without human intervention. The automation has yielded dramatic spikes in reconstruction productivity.
ALCF staff also worked to automate the Ion Orbiter workflow, which simulates particle trajectories and determines their hit locations on tokamak walls, culminating in production-ready analysis during experiments at DIII-D using Globus Flows to analyze data automatically between experiments Ultimately the IonOrbiter workflow will enable control-room personnel to quickly determine future wall-heating to enable plasma adjustments as needed.
Both the CAKE and Ion Orbiter workflows were demonstrated in the DOE booth at the Supercomputing 24 (SC24) conference.
Leading the Future of Inter-Facility Science
ALCF staff participate in and co-chair weekly Leadership Group meetings to direct overall IRI efforts and specific tasks for technical subcommittees, form new subcommittees, and work with the Pathfinder projects. In 2024, ALCF staff served on the organizing committee for the IRI/HPDF kickoff meeting in Gaithersburg, Maryland, and produced related materials describing outcomes from the meeting. ALCF staff also presented during the Leadership Group’s participation in the DOE ASCAC meeting in May 2024.
ALCF staff have participated in all existing IRI technical subcommittees from day one, including Outreach and Engagement, Interfaces, and TRUSTID. These groups are dedicated to designing and building functionality at computing facilities to facilitate their use by experimentalists.