Staff Spotlights

ALCF staff spotlights highlight some of the outstanding individuals at the lab and their accomplishments in 2024.

Left: Riccardo Balin; right: Murali Emani

Riccardo Balin, Assistant Computational Scientist

Riccardo Balin joined the ALCF in 2021 as a postdoc under the Aurora Early Science Program (ESP) working on methods to incorporate machine learning with computational fluid dynamics (CFD) and turbulence modeling simulations. In 2024, he joined the Data Services and Workflows team as an assistant computational scientist. In this role, Riccardo works with ALCF users on developing, scaling and applying novel workflows which aim to accelerate traditional simulations with AI methods. He also works closely with HPC vendors, ensuring the necessary software tools as well as new innovative solutions are available to our users.

In the last year, Riccardo continued his involvement in the Aurora ESP project aimed at developing ML-based closure models from ongoing simulations of turbulent aerodynamic flows. Thanks to his efforts, the project demonstrated INCITE readiness on the new Aurora supercomputer. Additionally, Riccardo spearheaded an Argonne led effort to incorporate AI methods into the nekRS CFD code called nekRS-ML. This collaboration now includes researchers from four divisions at Argonne and led to a new ALCC proposal for an AI-powered toolkit to accelerate simulation-based design space exploration. Riccardo also fostered new collaborations spanning across DOE labs to develop benchmarks for coupled simulation and AI workflows in order to better understand how these workloads can take full advantage of current and future HPC systems.

Murali Emani, Assistant Computational Scientist

Murali Emani is a Computer Scientist in the Artificial Intelligence and Machine Learning (AIML) group with the Argonne Leadership Computing Facility.

Murali develops performance models to identify and address bottlenecks while scaling machine learning and deep learning frameworks on emerging supercomputers for scientific applications. He also co-designs emerging hardware architectures to scale up machine learning algorithms.

Murali co-leads the AI Testbed where he explores the performance, efficiency of AI accelerators for scientific machine learning applications. He works in close collaboration with existing vendors to influence their future product offerings and also potential vendors the help procurement. He also organized various training sessions and tutorials on programming the AI Testbed accelerators at various venues such as SC that gained tremendous response from the community.

Murali is also a core member of the Model Architecture and Performance Evaluation working group in the Trillion Parameter Consortium (TPC) and organized tutorials and hackathon sessions at TPC event. He is actively engaged in the AuroraGPT project that aims to develop open science foundation model across various science disciplines. As a technical committee member of the first NAIRR workshop, he chaired break-out sessions to understand the gaps and goals for performance of AI applications.

Murali was also a part of SC24 GordonBell paper finalist (MProt-DPO). He serves as a co-chair for MLPerf HPC group at MLCommons to benchmark large scale ML on HPC systems overseeing group of scientist and engineers from HPC centers, industry and academia to identify state-of-art applications to help benchmark diverse supercomputers. He also has been actively engaged in Aurora exascale machine where I am the POC for two benchmark applications, and work with DL frameworks team at Intel to evaluate the software stack and work to optimize performance at scale.

Left: Varuni Sastry; right: Christine Simpson

Varuni Sastry, Assistant Computational Scientist

Varuni Sastry joined Argonne in the Data Science and Learning Division in 2021 as a predoctoral appointee, primarily working on AI Testbeds with a focus on evaluating and benchmarking different AI and ML workloads on next generation dataflow based hardware. She also assisted ALCF users in deploying different scientific AI workloads on these accelerators. In 2023, she officially joined the ALCF as an Assistant Computer scientist and continues to lead several efforts in enabling AI for Science workloads on AI Testbed and other supercomputers at the ALCF.

In 2024, Varuni joined the AuroraGPT pre-training team, setting up the data processing pipeline, and developing several key features for distributed training framework for scalable language and vision models. Varuni contributed to three different Gordon Bell submissions for 2024, and the “MProt-DPO” work on multimodal protein design workflow was selected as a Gordon Bell finalist. She was awarded an Impact Argonne Award for the Enhancement of Argonne’s Reputation for her contribution. She was also honored with an Impact Argonne Award for Extraordinary Efforts for her contribution to “AuroraGlimmer”, an effort to build a scalable pipeline for the AuroraGPT project. In collaboration with the CNM and APS teams, she developed a RAG based LLM based chat framework tailored for scientific facilities and this work was published at the Nature Partner Journal (NPJ) computational materials.

As part of outreach activities, Varuni co-organized several tutorial and workshops on AI Testbeds including sessions at SuperComputing’24, Argonne Training Program on Extreme-Scale Computing (ATPESC) and other ALCF events. She also delivered an invited talk at NNSA Emergent Technology Seminar. In addition, she co-organized INCITE GPU hackathon and served as a reviewer for both ATPESC’24 and INCITE’25 committees.

Christine Simpson, Assistant Computational Scientist

Christine Simpson joined the ALCF in 2022 as an Assistant Computational Scientist in the Datascience group. Now part of the Data Services and Workflows group, she primarily focuses on workflows on ALCF systems, which involves development and testing of workflow tools and user support and training. She is currently the lead developer for Balsam, an ALCF-developed workflow package that has enabled users to deploy high-throughput workflows on ALCF systems. She also works closely with the Parsl and Globus teams to help optimize their tools and services for ALCF users.

In the past year, Christine has been heavily involved in ALCF efforts in Integrated Research Infrastructure (IRI), an initiative to connect DOE’s experimental and computational facilities. She has been ALCF’s point person in a collaboration with the DIII-D National Fusion Facility and NERSC. She led efforts to run a production IRI workflow on Polaris analyzing the dynamics of neutral beams within the DIII-D tokamak during its fall campaign. This workflow was demonstrated on Polaris analyzing live DIII-D experiments during SC24 along with a concurrently running IRI workflow at NERSC. In addition to DIII-D, Christine works with a number of other IRI users analyzing data on ALCF systems. Christine has also worked on exploring new approaches for coupling simulation and AI/ML codes on ALCF systems. She has co-led efforts to explore a new workflow and data management tool called Dragon developed by HPE. She has worked to port a drug discovery workflow from the CANDLE project to Dragon and presented on this effort at PASC24. She has worked closely with HPE developers to improve Dragon performance and features for ALCF applications.

Christine is also the postdoc hiring lead for ALCF. Prior to joining ALCF, she received her PhD in Astronomy studying galaxy formation and evolution with numerical simulations.

Left: Sheeja Susan; right: Peter Upton

Sheeja Susan, Software Development Specialist

Sheeja Susan joined the ALCF in 2019 as a Software Development Specialist under the Advanced Integration Group. She was a contractor for 5 years and became a full-time employee in 2024.

At Argonne, Sheeja is responsible for developing and maintaining frontend screens within the ALCF portal and Allocation Request Management websites. She also creates test automation scripts to ensure the website pages run correctly. Sheeja was part of the team for the migration process from AngularJS to Angular for ALCF. As an individual player, she has developed ALCF screens such as Director’s Discretionary Allocation Request, View Systems and many pages for the administrators, and helped with the page routing. She works closely with UX/design teams and effectively translates design concepts into functional web pages. Coordinating with the backend team, Sheeja creates modular, reusable code components for the development of user-friendly, responsive pages across various screen sizes. She has written numerous test scripts in Cypress to test the functionality of webpages and helped to reduce the number of failing ALCF test scripts significantly.

In 2024, Sheeja mainly worked on the frontend development of Allocation Request Management (ARM) website and the new version of Director’s Discretionary Allocation Request form (DD Allocation Request form) for ALCF. The ALCF Allocations Committee reviews requests from the DD Allocation Request form through the Allocation Request Management site. The ARM site is the internal clearinghouse for ALCF staff to view, analyze, and ultimately approve allocation requests. Within the site, the Committee reviews a project’s goals and needs to discern whether the project fits with the ALCF resources and mission. Sheeja also participated in the ALCF portal development - to replace UB3 home page and login section with that of Portal, to create functionalities for the redirection of UB3 specific URLs from email links, and to modify UB3 menu with the same look and feel of the Portal menu. Lastly, she was also able to successfully implement Angular Route Guards to activate/deactivate the navigation to specific routes in ALCF.

Peter Upton, Systems Integration Administrator/Support

Peter Upton is a Systems Integration Administrator/Support at the ALCF. Peter manages, supports, and updates ALCF GitLab installations. He also assist in managing DNS and Salt systems, providing peer work reviews, troubleshoot GitLab user issues, and supported Aurora work by creating a new node type for administrative tasks.

Peter assist vendors in using Aurora to improve functionality and stability, as well as assisting and tracking work around a new ALCF testbed. He assist his coworkers with HPCM-related tasks, and participates in miscellaneous data-center physical activities (e.g., remote troubleshooting, cable routing). Peter also addresses security vulnerabilities promptly, keeping documentation up-to-date, and coordinating GitLab usage, licensing, and upgrades.

In 2024, Upton upgraded GitLab instances to RHEL9 and GitLab runners to RHEL9. He configured GitLab JLSE container registry. Peter also reworked the GitLab upgrade process to minimize downtime and increase consistency. He also automated Jacamar-CI RPM import.