Skip to content

< Back to Aurora Known Issues page

Sync tables now

Open Issues

Internal ID Description Vendor ID Reproducer Path PoC Priority? Pre-production? ETA Date Opened Last Updated
70 PALS gpu-bind, composite, envall lead to "launch failed" No response applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/envall Thomas Applencourt No response 2025-09-10 2025-09-10
68 warpx segfaults/hangs with OpenPMD enabled No response /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ Tim Williams No response 2025-08-23 2025-08-23
67 warpx Debug build crashes oneAPI compiler CMPLRLLVM-24314 /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ Tim Williams No response 2025-08-21 2025-09-03
66 Compiling with "-g" leads to a much larger binary than without CMPLRLLVM-69909, CMPLRLLVM-24314 lammps + -g Brian Holland No response 2025-08-20 2025-08-27
65 Clarification requested about ZE_DEVICE_PROPERTY_FLAG_ONDEMANDPAGING on PVC GSD-11510 source/reproducers/l0/ondemand_paging/ Colleen No response 2025-08-20 2025-08-21
64 E3SM fortran compile ICE CMPLRLLVM-69862 source/reproducers/ifx/e3sm_homme_ICE_error Abhi 2025.3.0 2025-08-18 2025-09-12
63 Kokkos kernels fails to build with kokkos built with openmp enabled CMPLRLLVM-69908 source/applications/kokkos-kernels Sean Koyama / Colleen Bertoni gone starting with 4.19 (fixed in 2025.3 branch, any chance of getting it sooner?) 2025-08-18 2025-08-20
62 -ftarget-register-alloc-mode=pvc:large and "-device 12.60.7" for AOT GSD-11490 source/reproducers/general/ftarget-register-alloc-mode_flag Steve Rangel No response 2025-08-14 2025-08-14
61 Failing unit tests on PVCs with 2025.2 oneAPI SDK -- is it expected? https://github.com/uxlfoundation/oneMath/issues/703 https://github.com/uxlfoundation/oneMath/issues/703 Colleen Bertoni No response 2025-07-30 2025-07-30
60 ext_oneapi_memcpy2d is significantly slower with implicit scaling than explicit CMPLRLLVM-69398, GSD-11459 source/reproducers/dpcpp/ext_oneapi_memcpy2d_perf Natalie Beams No response 2025-07-29 2025-08-06
58 kokkos inclusive and exclusive scan giving incorrect answers for 1146.10 CMPLRLLVM-69285 source/reproducers/dpcpp/kokkos_optimization_scan Daniel Arndt 🚨 No response 2025-07-23 2025-07-30
57 GPU segfault in gtensor_bench with 2025.2 MKLD-18276, CMPLRLIBS-35326, CMPLRLLVM-68696 source/applications/gtensor_bench Colleen Bertoni 2025.3 2025-07-22 2025-08-11
56 RSBench-SYCL incorrect answers with 1146.10 GSD-11247 source/applications/RSBench/ John Tramm, Colleen Bertoni LTS2 branch as part of IGC 2.15 series update (WW34, 3-4weeks) Need to change priority to get it into LTS2 (since things must be backported into it). Things may be fixed in mainline (rolling release) but not in LTS2. bug fixes everywhere, no new features for PVC in mainline. 2025-07-22 2025-08-20
55 Linking in LZ causes changes in signal handling cmplrlibs-35385, GSD-11413 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/signal_handler/ Thomas Applencourt, Colleen Bertoni No response 2025-07-22 2025-07-25
54 oneCCL zeMemGetAddressRange error with alltoallv and zero-sized buffers oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174, MLSL-3764 See instructions on oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174 Riccardo Balin 🚨 oneCCL 2021.17, oneAPI 2025.3 2025-07-18 2025-08-26
52 compiler segfaults linking warpx binary GSD-11357 /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/warpx Tim Williams 🚨 2025.2 + 1146.10 2025-07-07 2025-07-18
51 [SYCL] Bug from SYCL peer_access No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/sycl_peer_access Abhi Fixed internally with 2025.2 2025-07-02 2025-07-08
49 [E3SM] MPICH bug related to collectives tunning https://github.com/pmodels/mpich/issues/7456 https://github.com/pmodels/mpich/issues/7456 Abhi 🚨 Next Programming Environment (25.190) 2025-06-27 2025-08-21
48 Zombie Processes GSD-11266 none yet Servesh M 🚨 Should be within the LTS2 release (1146.12) 2025-06-25 2025-08-06
47 Non standard MPI knobs suggested for performance ANL-291 N/A Servesh M No response 2025-06-23 2025-06-27
45 DDT issues since Aurora upgrade No response /lus/flare/projects/catalyst/world_shared/zippy/ddt Tim Williams Linaro Forge 2025.0.1 has the workaround. GDB 2025.3 the root cause will be gone. 2025-06-12 2025-08-05
43 CMake can't find MKL::MKL_SYCL with MPI wrapper compilers No response https://github.com/thilinarmtb/onemkl_cmake_mpi_bug Thilina Ratnayaka, Colleen Bertoni improvements will be part of the next oneMKL release, 2025.3. 2025-06-11 2025-06-25
39 Feature request for Aurora runtime to include debugging symbols ANL-286, HPCS-15374, GSD-11460 feature request Ye Luo No response 2025-05-29 2025-08-06
38 One application in GRID consistently hangs GSD-11441 /lus/flare/projects/Aurora_deployment/xyjin/W/test_grid_g5r5_paboyle Xiao-Yong Jin 🚨 No response 2025-05-27 2025-08-18
37 xpu-smi reports "N/A" for GPU Utilization RITM0428460, ANL-279, GSD-11252 any run of xpu-smi Kyle Felker / Colleen Bertoni 1146.24 (Implemented, LTS2, WW34) 2025-05-22 2025-08-20
36 (Occasional Interruptible) hangs in applications Possibly related to ANL-215 /lus/flare/projects/Aurora_deployment/xyjin/W/test_example_detar.skel Xiao-Yong Jin 🚨 No response 2025-05-15 2025-07-09
34 Runtime Error: pytorch DDP with CCL_BCAST=<"double_tree, direct, naive, maybe others?"> MLSL-3729 In issue Nathan Nichols 2021.15.1 2025-05-15 2025-07-31
33 Crash when calling too many MPI_Probe https://github.com/pmodels/mpich/issues/7427 https://github.com/pmodels/mpich/issues/7427 David--Cléris Timothée No response 2025-05-15 2025-05-15
32 PETSc segfaults in sparse matrix calls IGDB-6516, GSD-10450 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mkl/csr_gemv_usm/ Junchao Zhang 🚨 2025.3 for part malloc_shared in MKL 2025-05-15 2025-06-25
31 GAMESS segfaults with -O0 GSD-10393, CMPLRLIBS-35345,GSD-11035 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/openmp/gamess_O0_page_fault Colleen Bertoni 🚨 1146.24 (Targeted for LTS2 (1146.12+), contained with the IGC 2.16 series / WW34 (2-3 weeks)) 2025-05-14 2025-08-20
30 Copy 2D/3D are broken (zeCommandListAppendMemoryCopyRegion) NEO-14954, GSD-11132 https://github.com/rpereira-dev/ze-zoo Romain PEREIRA and Thomas APPLENCOURT No response 2025-05-10 2025-08-05
29 Significant slowdown with LAMMPS in first run, subsequent runs much faster No response /flare/catalyst/proj_shared/knight/projects/ExtremeCarbon/snap-carbon-scaling/1B/ Christopher Knight No response 2025-05-09 2025-08-20
27 Build failures on PVC with Cutlass GSD-11099, https://github.com/codeplaysoftware/cutlass-sycl/issues/329 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/cutlass-sycl Abhi 🚨 agama 1133 and agama 1146 (ww24, ~ second week of June) 2025-05-07 2025-08-13
26 L0 memcpy bug GSD-11142, NEO-14641 I was doing the same run as QMCPACK SOW runs in the reframe Ye Luo 🚨 agama 1146 2025-05-06 2025-06-25
25 Compile fail in Lattice App Brian reproduced and confirms fixed in 2025.1 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/bug_cgpt_icpx Xiao-Yong Jin 🚨 Brian confirms fixed in 2025.1 2025-05-01 2025-05-02
22 SYCL In-order queue broken NEO-14641 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/in-order Thomas Applencourt 🚨 fix in ww24 (~second week of june) agama 1146 2025-04-23 2025-07-29
20 Issue with gpu-bind for mpiexec under ZE_FLAT_DEVICE_HIERARCHY=FLAT mode ANL-283/HPE Support Case 5390607860 See below Abhishek, Nathan, Khalid Likely March 2026 (Servesh requested HPE's gpu-bind match gpu_tile_compact.sh at least) 2025-04-16 2025-08-20
18 Ping failures and hangs with production runs using GPT/GRID ANL-251, RITM0404147, RITM0404148, RITM0405730, GSD-11441 /lus/flare/projects/LatticeFlavor/lehner Xiao-Yong Jin 🚨 No response 2025-04-04 2025-08-18
17 hang with MPI pipelining https://github.com/pmodels/mpich/issues/7373 Build and run commands are in the MPICH issue. James Osborn Should be fixed in top of aurora_test 2025-04-03 2025-08-20
13 XGC hangs at scale No response xgc-es-cpp-gpu app, ES_ITER test case Tim Williams 🚨 No response 2025-04-03 2025-04-03
12 CXI alloc failed on cxi1: request exceeds ACs limits No response None Not Thomas No response 2025-04-01 2025-08-04

Closed Issues

Internal ID Description Vendor ID Reproducer Path PoC Priority? Date Opened Closed Date
59 [ISHMEM] Unit test fails with ishmem 1.4.0 https://github.com/oneapi-src/ishmem/issues/10 https://github.com/oneapi-src/ishmem/issues/10 and source/applications/ishmem_sos Abhi 2025-07-25 2025-07-31
53 IFX Compiler reads and stores floating point values from a text file at single-precision No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/ifx/fp_precision Victor Anisimov 🚨 2025-07-09 2025-07-10
50 OpenMP Thread binding No response See bellow Romain PEREIRA 2025-07-02 2025-07-02
44 QMCPACK segfault in libomp No response Not yet created Ye Luo 🚨 2025-06-12 2025-07-23
42 Linking fails with old build environment No response /lus/flare/projects/PHASTA_aesp_CNDA/jrwrigh/petsc_build_test Kris Rowe 2025-06-06 2025-06-10
41 torch.compile segfaults for >2 tiles MLSL-3728 /flare/Aurora_deployment/vsastry/torch_compile Varuni Sastry 2025-06-06 2025-07-24
40 Need SYSMAN support for all modes in recent releases HPCS-15366, related: GSD-11104 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/leak_zesMemoryGetState Thomas Applencourt 🚨 2025-05-30 2025-06-17
35 Avoid outputs exceeding few KBs to stdout/stderr from MPI ranks RITM0425437 First issue Large MPI writes to stdout Servesh Muralidharan 2025-05-15 2025-07-23
28 CMake failures with SYCL No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/ Abhishek Bagusetty 2025-05-09 2025-05-09
24 Noticeably more "ping failed" than before the 2025.1 SDK + 1099.12 UMD/KMD upgrade JIRA is:  HPCS-15331 N/A Xiao-Yong Jin Colleen Bertoni 2025-05-01 2025-05-16
23 Apps stop running after Apr 29 upgrade due to libstdc++ dependency No response See details Ye Luo 2025-04-30 2025-05-06
21 Error during write with Quantum ESPRESSO No response see .zip file attached below, also /lus/flare/projects/matml_aesp_CNDA/dir_io_QE_crash Filippo Simini 🚨 2025-04-17 2025-04-18
19 Severe CPU memory growth in MPICH No response /flare/catalyst/world_shared/zippy/reproducers/issue19 Tim Williams 2025-04-04 2025-07-31
16 Catastrophic memory error in context lmp_aurora_kokkos No response public LAMMPS Chris Knight 2025-04-03 2025-07-23
9 Multithreaded data-transfer can cause page-fault N/A Full QMCPACK Ye Luo 2025-04-01 2025-05-08
8 Lots of H2D copies produce CPU I9 error and incorrect value N/A Full QMCPACK Ye Luo 🚨 2025-04-01 2025-05-28
7 MPI_Bcast gets faster when turning off XPMEM pmodels/mpich#7334 see Issue on MPICH GitHub repo Ye Luo 2025-04-01 2025-04-24
6 MPICH memory allocation slows down at scale pmodels/mpich#7333 see MPICH issue Ye Luo 🚨 2025-04-01 2025-04-24
4 Incorrect results in receive buffer in GPU memory MPICH 7312 grid application (lattice QCD) Patrick Steinbrecher, Tim Williams 🚨 2025-03-25 2025-04-24
3 Linker error found by XGC CMPLRLLVM-66496 /home/zippy/smalltests/aurora/xgc42/fails Tim Williams 2025-03-19 2025-03-28

Update tables

Automatically updated nightly. To update now, wait 10-15s after last change to AuroraBugTracking Issues, then run (anywhere on a machine that has authenticated with gh):

gh workflow run "Update Submodules" --repo "argonne-lcf/user-guides" && GH_FORCE_TTY=100% watch -c -n1 gh run list --repo "argonne-lcf/user-guides"
And wait ~2m until no jobs are running.

Or execute aurora-bug-table-sync.sh to automatically run everything step-by-step and know exactly when the changes are live online.