Staff Spotlights

From enhancing HPC capabilities and AI workflows to supporting cutting-edge research, these ALCF staff members helped advance the forefront of scientific computing in 2024.

Left: Riccardo Balin; right: Murali Emani

Riccardo Balin, Assistant Computational Scientist

Riccardo Balin joined the ALCF in 2021 as a postdoc under the Aurora Early Science Program (ESP) to work on methods to incorporate machine learning with computational fluid dynamics (CFD) and turbulence modeling simulations. In 2024, he joined the Data Services and Workflows team as an assistant computational scientist. In this role, Riccardo works with ALCF users on developing, scaling, and applying novel workflows which aim to accelerate traditional simulations with AI methods. He also works closely with HPC vendors, ensuring the necessary software tools as well as new innovative solutions are available to our users.

In the last year, Riccardo continued his involvement in an Aurora ESP project aimed at developing machine learning-based closure models from ongoing simulations of turbulent aerodynamic flows. Thanks to his efforts, the project demonstrated INCITE readiness on the Aurora supercomputer. Additionally, Riccardo has spearheaded an Argonne-led effort to incorporate AI methods into the nekRS CFD code called nekRS-ML. This collaboration now includes researchers from four divisions at Argonne and led to a new ALCC proposal for an AI-powered toolkit to accelerate simulation-based design space exploration. Riccardo has also fostered new collaborations spanning across DOE labs to develop benchmarks for coupled simulation and AI workflows in order to better understand how these workloads can take full advantage of current and future HPC systems.

Murali Emani, Assistant Computer Scientist

Murali Emani is an assistant computer scientist in the ALCF’s Artificial Intelligence and Machine Learning (AIML) group. In this role, he develops performance models to identify and address bottlenecks while scaling machine learning and deep learning frameworks on emerging supercomputers for scientific applications. He also co-designs emerging hardware architectures to scale up machine learning algorithms.

Murali co-leads the ALCF AI Testbed, where he explores the performance and efficiency of AI accelerators for scientific machine learning applications. He works in close collaboration with existing vendors to influence their future product offerings and with potential vendors to aid in procurement. He has also organized various training sessions and tutorials on programming the AI Testbed accelerators at various venues, such as the 2024 Supercomputing Conference (SC24), that gained tremendous response from the scientific commputing community.

Murali is a core member of the Model Architecture and Performance Evaluation working group in the Trillion Parameter Consortium and helped organize tutorials and hackathon sessions for the community. He is actively engaged in the AuroraGPT project, which aims to develop open science foundation models across various science disciplines. As a technical committee member of the first NAIRR workshop, he chaired breakout sessions to understand the gaps and goals for performance of AI applications.

Murali was a co-author of the “MProt-DPO” paper, a Gordon Bell Prize finalizst at SC24. He serves as a co-chair for the MLPerf HPC group at MLCommons, which benchmarks large-scale machine learning on HPC systems. He also has been engaged in Aurora development, , serving as the point of contact for two benchmark applications, while collaborating with Intel’s deep learning frameworks team to evaluate and optimize the software stack for performance at scale.

Left: Varuni Sastry; right: Christine Simpson

Varuni Sastry, Assistant Computer Scientist

Varuni Sastry joined Argonne’s Data Science and Learning division in 2021 as a predoctoral appointee, primarily working on AI testbeds with a focus on evaluating and benchmarking different AI and ML workloads on next-generation dataflow-based hardware. She also assisted ALCF users in deploying different scientific AI workloads on these accelerators. In 2023, she joined the ALCF as an assistant computer scientist and continues to lead several efforts in enabling AI for science workloads on the AI Testbed and other supercomputers at the ALCF.

In 2024, Varuni joined the AuroraGPT pre-training team, setting up the data processing pipeline and developing several key features for a distributed training framework for scalable language and vision models. Varuni contributed to three different Gordon Bell submissions in 2024, with the “MProt-DPO” project on multimodal protein design workflows selected as a Gordon Bell finalist. She was awarded an Impact Argonne Award for the enhancement of Argonne’s reputation for her contribution. She was also honored with an Impact Argonne Award for extraordinary efforts for her work with “AuroraGlimmer,” a project aiming to build a scalable pipeline for the AuroraGPT project. In collaboration with teams from Argonne’s Center for Nanoscale Materials and Advanced Photon Source, she developed a large language model-based chat framework incorporating retrieval-augmented generation and tailored for scientific facilities, published in npj Computational Materials.

As part of outreach activities, Varuni co-organized several tutorial and workshops for AI Testbed systems including sessions at SC24, Argonne Training Program on Extreme-Scale Computing (ATPESC), and other ALCF events. She also delivered an invited talk at NNSA Emergent Technology Seminar. In addition, she co-organized the ALCF’s INCITE GPU Hackathon and served as a reviewer for both the ATPESC 2024 and INCITE 2025 committees.

Christine Simpson, Assistant Computational Scientist

Christine Simpson joined the ALCF in 2022 as an assistant computational scientist in the Data Science group. Now part of the Data Services and Workflows group, she primarily focuses on workflows on ALCF systems, which involves the development and testing of workflow tools and user support and training. She is currently the lead developer for Balsam, an ALCF-developed workflow package that has enabled users to deploy high-throughput workflows on ALCF systems. She also works closely with the Parsl and Globus teams to help optimize their tools and services for ALCF users.

In the past year, Christine has been heavily involved in ALCF efforts relating to DOE’s Integrated Research Infrastructure (IRI), an initiative to seamlessly connect experimental and computational facilities. She has been the ALCF point person in a collaboration with the DIII-D National Fusion Facility and NERSC, and led efforts to run a production IRI workflow on the ALCF’s Polaris supercomputer to analyze the dynamics of neutral beams within the DIII-D tokamak during its fall campaign. A demonstration was performed at the SC24 wherein Polaris analyzed live DIII-D experiments concurrent with an IRI workflow at NERSC. In addition to DIII-D effots, Christine works with a number of other IRI users to process and analyze data on ALCF systems.

Christine has also worked on exploring new approaches for coupling simulation and AI/ML codes on ALCF systems. She has co-led efforts to explore a new workflow and data management tool called Dragon developed by HPE. She has worked to port a drug discovery workflow from the CANDLE project to Dragon and presented on this effort at the 2024 Platform for Advanced Scientific Computing Conference (PASC24). She has worked closely with HPE developers to improve Dragon performance and features for ALCF applications.

Christine is also the ALCF’s postdoc hiring lead. Prior to joining Argonne, she received her PhD in Astronomy studying galaxy formation and evolution with numerical simulations.

Left: Sheeja Susan; right: Peter Upton

Sheeja Susan, Software Development Specialist

Sheeja Susan joined the ALCF in 2019 as a software development specialist in the Advanced Integration group. She was a contractor for five years and became a full-time employee in 2024. At the ALCF, Sheeja is responsible for developing and maintaining frontend screens within the ALCF Portal and Allocation Request Management (ARM) websites. She also creates test automation scripts to ensure the websites run correctly, and was part of the team that managed migration from AngularJS to Angular in ALCF scripts. She has developed ALCF screens such as the Director’s Discretionary Allocation Request and View Systems pages—along with many others for administrators—and helped implement the page routing. She works closely with the ALCF UX/design teams and effectively translates design concepts into functional webpages. Coordinating with the backend team, Sheeja creates modular, reusable code components for the development of user-friendly, responsive pages across various screen sizes. She has written numerous test scripts in Cypress to test the functionality of webpages and helped to reduce significantly the number of failing ALCF test scripts.

Much of Sheeja’s work in 2024 revolved around the frontend development of ALCF ARM website and the new version of the Director’s Discretionary (DD) Allocation Request form. The ALCF Allocations Committee reviews requests from the DD Allocation Request form through the ARM site. The ARM site is the internal clearinghouse for ALCF staff to view, analyze, and ultimately approve allocation requests. Within the site, the committee reviews a project’s goals and discerns whether the project fits with the ALCF resources and mission. Sheeja also participated in the ALCF portal development to replace the UB3 home page and login section with that of portal, to create functionalities for the redirection of UB3-specific URLs from email links, and to modify the UB3 menu to share the same look and feelas the portal menu. Lastly, she was also able to successfully implement Angular Route Guards to activate and deactivate specific navigation routes.

Peter Upton, Systems Integration Administrator/Support

Peter Upton is a systems integration administrator/support at the ALCF. He manages, supports, and updates ALCF GitLab installations and also assists in managing DNS and Salt systems. In this capacity he provides peer reviews, troubleshoots GitLab user issues, and supports work on the Aurora system, for which he has helped create a new node type for administrative tasks.

In addition to assisting vendors in using Aurora to improve functionality and stability—while also assisting and tracking work around a new ALCF testbed-—Peter aids his coworkers with HPCM-related tasks and participates in miscellaneous data-center physical activities (such as remote troubleshooting and cable routing). He also addresses security vulnerabilities promptly, keeping documentation up to date and coordinating GitLab usage, licensing, and upgrades.

In 2024, Upton upgraded GitLab instances to RHEL9 and GitLab runners to RHEL9. He configured GitLab JLSE container registry and reworked the GitLab upgrade process to minimize downtime and increase consistency. He also automated Jacamar-CI RPM import.