Skip to content

< Back to Aurora Known Issues page

Sync tables now

Open Issues

Internal ID Description Vendor ID Reproducer Path PoC Priority? ETA Date Opened Last Updated
101 Signalling a clSetUserEventStatus does not wake up barriers a barrier depending on it for in-order queues. GSD-12087 source/reproducers/opencl/user_event_in_order Paulius Velesko No response 2025-12-11 2025-12-12
100 Level Zero event used between an in-order immediate command list and out-of-order regular comment list resulting in ZE_RESULT_ERROR_INVALID_ARGUMENT GSD-12085 source/reproducers/l0/inorder_outorder_event/ Paulius Velesko No response 2025-12-11 2025-12-11
99 Advisor tripcounts analysis fails with a PyTorch example. ADV-10735 /flare/Performance/jkwack/Tools/Roofline/SC25_tutorial/ai_ml_profiling/reproducer or /lus/flare/projects/Tools/jkwack-tools-reproducer/JaeHyuk/advisor_pytorch/reproducer or source/reproducers/tools/pytorch_advisor JaeHyuk Kwack 🚨 No response 2025-12-09 2025-12-11
98 Hanging OpenCL code when one command queue waits on an event from another command queue CMPLRLLVM-72048 source/reproducers/opencl/hanging_marker Colleen Under investigation 2025-12-02 2025-12-11
97 SHMEM on Aurora: Unit test wait_until_all-on_queue-2 hanging https://github.com/oneapi-src/ishmem/issues/15 source/applications/ishmem_sis Colleen / Abhi Actively working on it 2025-11-21 2025-12-10
96 Sporadic libze_intel_gpu.so segmentation fault when running QMCPACK GSD-12033 See attached reproducer Ye Luo 🚨 No response 2025-11-17 2025-12-09
95 Memory leak in Libfabric No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/cxi_memory_lead Rob Lathan Fixed by https://github.com/ofiwg/libfabric/pull/11334, thanks Rob! 2025-11-13 2025-12-10
94 zeMemFree slowdown in a loop GSD-11962 source/reproducers/l0/zememfree_slowdown/ Colleen No response 2025-11-08 2025-11-12
93 oneCCL exeption with PyTorch DTensor: SYCL recv is not supported for multi-node case MLSL-3951 In the text body Väinö Hatanpää Assigned 2025-11-05 2025-11-12
92 SYCL device info free_memory wrong on 2-stack PVC1550 GPU CMPLRLLVM-71510, URLZA-691, GSD-12043 source/reproducers/dpcpp/sycl_free_flat Jakub H Under investigation 2025-10-31 2025-12-10
91 sycl failed malloc_device on GPU takes 20 seconds GSD-10587 source/reproducers/dpcpp/slow_alloc/ Jakub H post-1146.40, fixed internally, but under investigation 2025-10-31 2025-12-10
90 Device Sanitizer + LIBOMPTARGET_DEBUG=1 issues for the GAMESS RI-MP2 mini-app CMPLRLLVM-71455 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/tools/sanitizer_rimp2_test Brian Fixed internally, 2026.0 (next year) 2025-10-31 2025-11-12
89 Device Sanitizer breaks with MKL DGEMM call in GAMESS RI-MP2 mini-app MKLD-19334 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/tools/sanitizer_rimp2_test Brian, JaeHyuk Fixed internally, pending verification, 2026.0 2025-10-31 2025-11-12
88 RPATH issue when mixing and matching SDK and spack packages built by another SDK No response No need. reprdducer attached in this ticket Ye Luo No response 2025-10-30 2025-10-30
87 QUDA compile fail cmplrllvm-70981 source/reproducers/openmp/quda_crash Xiayong Jin / Brian W In progress 2025-10-28 2025-10-29
86 omp_alloc should support pinned memory, or implement proper fallback behavior CMPLRLIBS-35442 /home/kweide/projects/OpenMP_VV/tests/5.1/allocate/test_omp_alloctrait_pinned.c and source/reproducers/openmp/omp_alloctrait_pinned in the test set Klaus Weide In progress 2025-10-28 2025-11-12
85 zeEventQueryKernelTimestampsExt is broken with IMM command lists GSD-11124 source/reproducers/l0/zeEventQueryKernelTimestampsExt_clock Thomas/John Mellor-Crummey In progress 2025-10-27 2025-11-12
84 Device Sanitizer is not functional with OpenMP C/Fortran codes /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/sanitizer and source/reproducers/tools/sanitizer and source/reproducers/tools/sanitizer_rimp2_test in the test set JaeHyuk Kwack 🚨 2025.3 2025-10-22 2025-10-31
83 With ifx, openmp_version is missing from omp_lib CMPLRLIBS-35365 /home/kweide/tests/test_openmp_version.f90 and source/reproducers/openmp/omp_version in the test set Klaus Weide 2025.3 2025-10-20 2025-10-24
82 Symbol missing issue with 1.3 version onwards in SLES and Intel Datacenter Max GPU on Aurora https://github.com/intel/xpumanager/issues/113 https://github.com/intel/xpumanager/issues/113 Servesh In progress. ETA drop after next (github fix in a few weeks, LTS end of year) 2025-10-16 2025-12-10
81 IGC_StackOverflowDetection not working GSD-11763 source/reproducers/openmp/stack_overflow_not_working Brian In progress 2025-10-15 2025-10-29
80 VTune fails with "Assertion failed: tool_gtpin_support:126: (buffer) " VASP-32612, GTPIN-1169 /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/Apps/GAMESS_RI-MP2_MiniApp source/reproducers/tools/vtune_gtpin_fail in the test set JaeHyuk Kwack 🚨 2025.3 2025-10-10 2025-10-30
79 Advisor fail with "advisor: Warning: The application returned a non-zero exit value." ADV-10687 source/reproducers/tools/advisor_gflop JaeHyuk Kwack Fixed with advisor --version == 616302, which should be in 2025.3 2025-10-08 2025-10-30
78 ExchCXX fails to compile with is too large for Clang to process CMPLRLLVM-70962, (general and related: CMPLRLLVM-53145, CMPLRLLVM-69909, CMPLRLLVM-24314) source/reproducers/dpcpp/jit_too_large_for_Clang Abhi 🚨 2026.0 ("-Xclang -fignore-overflow-error" by default for SYCL) 2025-10-06 2025-12-09
77 [SYCL] Function pointers compilation issue CMPLRLLVM-16317 Reproducer below and source/reproducers/dpcpp/func_pointers Abhi, Patrick Steinbrecher 🚨 Under discussion 2025-10-06 2025-10-15
76 Segfaults in MPICH routines in next-eval No response for XGC: /lus/flare/projects/catalyst/world_shared/zippy/xgc Tim Williams 🚨 No response 2025-10-01 2025-10-01
74 ZES_ENABLE_SYSMAN should default to 1 in the oneapi module No response see Details Tim Williams No response 2025-09-29 2025-10-15
73 "error: undefined reference to `old_llvm.umul.with.overflow.i64'" in newer kokkos CMPLRLLVM-70603 source/reproducers/dpcpp/kokkos_mdspan_umul Daniel Arndt Being worked internally 2025-09-17 2025-10-14
71 RPC launch error tracking 2025-09-15 2025-09-23
70 PALS gpu-bind, composite, envall lead to "launch failed" DCE Case 5392152905 applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/envall Thomas Applencourt Fixed in USS-1.5 (March '26) 2025-09-10 2025-12-09
68 warpx segfaults/hangs with OpenPMD enabled No response /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ Tim Williams No response 2025-08-23 2025-08-23
67 warpx Debug build crashes oneAPI compiler CMPLRLLVM-24314 /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ Tim Williams No response 2025-08-21 2025-10-29
66 Compiling with "-g" leads to a much larger binary than without CMPLRLLVM-69909, CMPLRLLVM-24314 (similar JIRAs) lammps + -g Brian Holland No response 2025-08-20 2025-12-10
65 Clarification requested about ZE_DEVICE_PROPERTY_FLAG_ONDEMANDPAGING on PVC GSD-11510 source/reproducers/l0/ondemand_paging/ Colleen implemented, post-1146.40, ~ Jan. 2025-08-20 2025-12-10
64 E3SM fortran compile ICE CMPLRLLVM-69862 source/reproducers/ifx/e3sm_homme_ICE_error Abhi 2025.3.0 2025-08-18 2025-10-09
63 Kokkos kernels fails to build with kokkos built with openmp enabled CMPLRLLVM-69908 source/applications/kokkos-kernels Sean Koyama / Colleen Bertoni gone starting with 4.19 (fixed in 2025.3 branch) 2025-08-18 2025-09-16
62 -ftarget-register-alloc-mode=pvc:large and "-device 12.60.7" for AOT GSD-11490 source/reproducers/general/ftarget-register-alloc-mode_flag Steve Rangel Fixed internally, post-1446.40, ~Jan 2025-08-14 2025-12-10
60 ext_oneapi_memcpy2d is significantly slower with implicit scaling than explicit and on PVC vs A100 Really depends on GSD-11132 source/reproducers/dpcpp/ext_oneapi_memcpy2d_perf Natalie Beams No response 2025-07-29 2025-10-01
58 kokkos inclusive and exclusive scan giving incorrect answers for 1146.10 CMPLRLLVM-69285, GSD-11736 source/reproducers/dpcpp/kokkos_optimization_scan Daniel Arndt 🚨 1146.40 (two weeks out -- end of Nov) 2025-07-23 2025-12-10
57 GPU segfault in gtensor_bench with 2025.2 MKLD-18276, CMPLRLIBS-35326, CMPLRLLVM-68696 source/applications/gtensor_bench Colleen Bertoni 2025.3 2025-07-22 2025-08-11
56 RSBench-SYCL incorrect answers with 1146.10 GSD-11247 source/applications/RSBench/ John Tramm, Colleen Bertoni 1146.31 2025-07-22 2025-09-17
55 Linking in LZ causes changes in signal handling cmplrlibs-35385, GSD-11413 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/signal_handler/ Thomas Applencourt, Colleen Bertoni Fixed internally, still in vetting 2025-07-22 2025-12-10
54 oneCCL zeMemGetAddressRange error with alltoallv and zero-sized buffers oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174, MLSL-3764 See instructions on oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174 Riccardo Balin 🚨 oneCCL 2021.17, oneAPI 2025.3 2025-07-18 2025-08-26
52 compiler segfaults linking warpx binary GSD-11357, GSD-11855 /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/warpx Tim Williams 🚨 2025.2 + 1146.10 2025-07-07 2025-11-12
47 Non standard MPI knobs suggested for performance ANL-291 N/A Servesh M No response 2025-06-23 2025-06-27
43 CMake can't find MKL::MKL_SYCL with MPI wrapper compilers No response https://github.com/thilinarmtb/onemkl_cmake_mpi_bug Thilina Ratnayaka, Colleen Bertoni improvements will be part of the next oneMKL release, 2025.3. 2025-06-11 2025-06-25
39 Feature request for Aurora runtime to include debugging symbols ANL-286, HPCS-15374, GSD-11427 feature request Ye Luo 1146.40 drop 2025-05-29 2025-12-10
38 One application in GRID consistently hangs GSD-11441 /lus/flare/projects/Aurora_deployment/xyjin/W/test_grid_g5r5_paboyle Xiao-Yong Jin 🚨 Fixed Internally, likely end of Nov. 1146.40 2025-05-27 2025-12-10
36 (Occasional Interruptible) hangs in applications Possibly related to ANL-215 /lus/flare/projects/Aurora_deployment/xyjin/W/test_example_detar.skel Xiao-Yong Jin 🚨 No response 2025-05-15 2025-07-09
33 Crash when calling too many MPI_Probe https://github.com/pmodels/mpich/issues/7427 https://github.com/pmodels/mpich/issues/7427 David--Cléris Timothée No response 2025-05-15 2025-05-15
32 PETSc segfaults in sparse matrix calls IGDB-6516, GSD-10450 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mkl/csr_gemv_usm/ Junchao Zhang 🚨 2025.3 for part malloc_shared in MKL 2025-05-15 2025-06-25
31 GAMESS segfaults with -O0 GSD-10393, CMPLRLIBS-35345,GSD-11035 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/openmp/gamess_O0_page_fault Colleen Bertoni 🚨 1146.31 (Targeted for LTS2 (1146.12+), contained with the IGC 2.16 series / WW34 (2-3 weeks)) 2025-05-14 2025-09-17
30 Copy 2D/3D are broken (zeCommandListAppendMemoryCopyRegion) NEO-14954, GSD-11132 https://github.com/rpereira-dev/ze-zoo Romain PEREIRA and Thomas APPLENCOURT 🚨 No response 2025-05-10 2025-09-17
29 Significant slowdown with LAMMPS in first run, subsequent runs much faster No response /flare/catalyst/proj_shared/knight/projects/ExtremeCarbon/snap-carbon-scaling/1B/ Christopher Knight No response 2025-05-09 2025-08-20
17 hang with MPI pipelining https://github.com/pmodels/mpich/issues/7373 Build and run commands are in the MPICH issue. James Osborn Merged in https://github.com/pmodels/mpich/pull/7622 2025-04-03 2025-10-14
13 XGC hangs at scale CMPLRTST-27836 xgc-es-cpp-gpu app, ES_ITER test case Tim Williams 🚨 No response 2025-04-03 2025-09-17

Closed Issues

Internal ID Description Vendor ID Reproducer Path PoC Priority? Date Opened Closed Date
75 "MPL_gpu_query_is_same_dev(int, int): Assertion `global_dev1 >= 0 && global_dev1 < known_ze_device_count' failed." with mpich.dbg No response https://github.com/pmodels/mpich/issues/7602 Tim, JaeHyuk, Colleen 2025-09-30 2025-10-13
72 MPI_aborts in many applications in next-eval at larger scales No response N/A Brian Holland / Tim Williams 2025-09-16 2025-09-30
61 Failing unit tests on PVCs with 2025.2 oneAPI SDK -- is it expected? https://github.com/uxlfoundation/oneMath/issues/703, CMPLRLLVM-69572, ONSAM-1930, GSD-11482 https://github.com/uxlfoundation/oneMath/issues/703 Colleen Bertoni 2025-07-30 2025-11-12
59 [ISHMEM] Unit test fails with ishmem 1.4.0 https://github.com/oneapi-src/ishmem/issues/10 https://github.com/oneapi-src/ishmem/issues/10 and source/applications/ishmem_sos Abhi 2025-07-25 2025-07-31
53 IFX Compiler reads and stores floating point values from a text file at single-precision No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/ifx/fp_precision Victor Anisimov 🚨 2025-07-09 2025-07-10
51 [SYCL] Bug from SYCL peer_access No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/sycl_peer_access Abhi 2025-07-02 2025-10-13
50 OpenMP Thread binding No response See bellow Romain PEREIRA 2025-07-02 2025-07-02
49 [E3SM] MPICH bug related to collectives tunning https://github.com/pmodels/mpich/issues/7456 https://github.com/pmodels/mpich/issues/7456 Abhi 🚨 2025-06-27 2025-10-09
48 Zombie Processes GSD-11266 none yet Servesh M 🚨 2025-06-25 2025-10-29
45 DDT issues since Aurora upgrade No response /lus/flare/projects/catalyst/world_shared/zippy/ddt Tim Williams 2025-06-12 2025-11-03
44 QMCPACK segfault in libomp No response Not yet created Ye Luo 🚨 2025-06-12 2025-07-23
42 Linking fails with old build environment No response /lus/flare/projects/PHASTA_aesp_CNDA/jrwrigh/petsc_build_test Kris Rowe 2025-06-06 2025-06-10
41 torch.compile segfaults for >2 tiles MLSL-3728 /flare/Aurora_deployment/vsastry/torch_compile Varuni Sastry 2025-06-06 2025-07-24
40 Need SYSMAN support for all modes in recent releases HPCS-15366, related: GSD-11104 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/leak_zesMemoryGetState Thomas Applencourt 🚨 2025-05-30 2025-06-17
37 xpu-smi reports "N/A" for GPU Utilization RITM0428460, ANL-279, GSD-11252 any run of xpu-smi Kyle Felker / Colleen Bertoni 2025-05-22 2025-10-29
35 Avoid outputs exceeding few KBs to stdout/stderr from MPI ranks RITM0425437 First issue Large MPI writes to stdout Servesh Muralidharan 2025-05-15 2025-07-23
34 Runtime Error: pytorch DDP with CCL_BCAST=<"double_tree, direct, naive, maybe others?"> MLSL-3729 In issue Nathan Nichols 2025-05-15 2025-10-13
28 CMake failures with SYCL No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/ Abhishek Bagusetty 2025-05-09 2025-05-09
27 Build failures on PVC with Cutlass GSD-11099, https://github.com/codeplaysoftware/cutlass-sycl/issues/329 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/cutlass-sycl Abhi 🚨 2025-05-07 2025-10-13
26 L0 memcpy bug GSD-11142, NEO-14641 I was doing the same run as QMCPACK SOW runs in the reframe Ye Luo 🚨 2025-05-06 2025-10-13
25 Compile fail in Lattice App Brian reproduced and confirms fixed in 2025.1 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/bug_cgpt_icpx Xiao-Yong Jin 🚨 2025-05-01 2025-10-13
24 Noticeably more "ping failed" than before the 2025.1 SDK + 1099.12 UMD/KMD upgrade JIRA is:  HPCS-15331 N/A Xiao-Yong Jin Colleen Bertoni 2025-05-01 2025-05-16
23 Apps stop running after Apr 29 upgrade due to libstdc++ dependency No response See details Ye Luo 2025-04-30 2025-05-06
22 SYCL In-order queue broken NEO-14641 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/in-order Thomas Applencourt 🚨 2025-04-23 2025-10-13
21 Error during write with Quantum ESPRESSO No response see .zip file attached below, also /lus/flare/projects/matml_aesp_CNDA/dir_io_QE_crash Filippo Simini 🚨 2025-04-17 2025-04-18
20 Issue with gpu-bind for mpiexec under ZE_FLAT_DEVICE_HIERARCHY=FLAT mode ANL-283/HPE Support Case 5390607860 See below Abhishek, Nathan, Khalid 2025-04-16 2025-10-01
19 Severe CPU memory growth in MPICH No response /flare/catalyst/world_shared/zippy/reproducers/issue19 Tim Williams 2025-04-04 2025-07-31
18 Ping failures and hangs with production runs using GPT/GRID ANL-251, RITM0404147, RITM0404148, RITM0405730, GSD-11441 /lus/flare/projects/LatticeFlavor/lehner Xiao-Yong Jin 🚨 2025-04-04 2025-12-11
16 Catastrophic memory error in context lmp_aurora_kokkos No response public LAMMPS Chris Knight 2025-04-03 2025-07-23
12 CXI alloc failed on cxi1: request exceeds ACs limits No response None Not Thomas 2025-04-01 2025-12-09
9 Multithreaded data-transfer can cause page-fault N/A Full QMCPACK Ye Luo 2025-04-01 2025-05-08
8 Lots of H2D copies produce CPU I9 error and incorrect value N/A Full QMCPACK Ye Luo 🚨 2025-04-01 2025-05-28
7 MPI_Bcast gets faster when turning off XPMEM pmodels/mpich#7334 see Issue on MPICH GitHub repo Ye Luo 2025-04-01 2025-04-24
6 MPICH memory allocation slows down at scale pmodels/mpich#7333 see MPICH issue Ye Luo 🚨 2025-04-01 2025-04-24
4 Incorrect results in receive buffer in GPU memory MPICH 7312 grid application (lattice QCD) Patrick Steinbrecher, Tim Williams 🚨 2025-03-25 2025-04-24
3 Linker error found by XGC CMPLRLLVM-66496 /home/zippy/smalltests/aurora/xgc42/fails Tim Williams 2025-03-19 2025-03-28

Update tables

Automatically updated nightly. To update now, wait 10-15s after last change to AuroraBugTracking Issues, then run (anywhere on a machine that has authenticated with gh):

gh workflow run "Update Submodules" --repo "argonne-lcf/user-guides" && GH_FORCE_TTY=100% watch -c -n1 gh run list --repo "argonne-lcf/user-guides"
And wait ~2m until no jobs are running.

Or execute aurora-bug-table-sync.sh to automatically run everything step-by-step and know exactly when the changes are live online.