Skip to content

< Back to Aurora Known Issues page

Sync tables now

Open Issues

Internal ID Description Vendor ID Reproducer Path PoC Priority? ETA Date Opened Last Updated
80 VTune fails with "Assertion failed: tool_gtpin_support:126: (buffer) " No response /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/Apps/GAMESS_RI-MP2_MiniApp source/reproducers/tools/vtune_gtpin_fail in the test set JaeHyuk Kwack 🚨 2025.3 2025-10-10 2025-10-13
79 Advisor fail with "advisor: Warning: The application returned a non-zero exit value." ADV-10687 source/reproducers/tools/advisor_gflop JaeHyuk Kwack Fixed with advisor --version == 616302, which should be in 2025.3 2025-10-08 2025-10-10
78 ExchCXX fails to compile with is too large for Clang to process CMPLRLLVM-70962 source/reproducers/dpcpp/jit_too_large_for_Clang Abhi No response 2025-10-06 2025-10-07
77 [SYCL] Function pointers compilation issue CMPLRLLVM-16317 Reproducer below Abhi, Patrick Steinbrecher 🚨 No response 2025-10-06 2025-10-06
76 Segfaults in MPICH routines in next-eval No response for XGC: /lus/flare/projects/catalyst/world_shared/zippy/xgc Tim Williams 🚨 No response 2025-10-01 2025-10-01
74 ZES_ENABLE_SYSMAN should default to 1 in the oneapi module No response see Details Tim Williams No response 2025-09-29 2025-10-01
73 "error: undefined reference to `old_llvm.umul.with.overflow.i64'" in newer kokkos CMPLRLLVM-70603 source/reproducers/dpcpp/kokkos_mdspan_umul Daniel Arndt Being worked internally 2025-09-17 2025-10-07
71 RPC launch error tracking 2025-09-15 2025-09-23
70 PALS gpu-bind, composite, envall lead to "launch failed" DCE Case 5392152905 applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/envall Thomas Applencourt patch to test out and that should already be landed for a future release 2025-09-10 2025-10-02
68 warpx segfaults/hangs with OpenPMD enabled No response /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ Tim Williams No response 2025-08-23 2025-08-23
67 warpx Debug build crashes oneAPI compiler CMPLRLLVM-24314 /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ Tim Williams No response 2025-08-21 2025-09-03
66 Compiling with "-g" leads to a much larger binary than without CMPLRLLVM-69909, CMPLRLLVM-24314 lammps + -g Brian Holland No response 2025-08-20 2025-10-01
65 Clarification requested about ZE_DEVICE_PROPERTY_FLAG_ONDEMANDPAGING on PVC GSD-11510 source/reproducers/l0/ondemand_paging/ Colleen No response 2025-08-20 2025-09-17
64 E3SM fortran compile ICE CMPLRLLVM-69862 source/reproducers/ifx/e3sm_homme_ICE_error Abhi 2025.3.0 2025-08-18 2025-10-09
63 Kokkos kernels fails to build with kokkos built with openmp enabled CMPLRLLVM-69908 source/applications/kokkos-kernels Sean Koyama / Colleen Bertoni gone starting with 4.19 (fixed in 2025.3 branch) 2025-08-18 2025-09-16
62 -ftarget-register-alloc-mode=pvc:large and "-device 12.60.7" for AOT GSD-11490 source/reproducers/general/ftarget-register-alloc-mode_flag Steve Rangel Possible fix internally 2025-08-14 2025-09-17
61 Failing unit tests on PVCs with 2025.2 oneAPI SDK -- is it expected? https://github.com/uxlfoundation/oneMath/issues/703, CMPLRLLVM-69572, ONSAM-1930, GSD-11482 https://github.com/uxlfoundation/oneMath/issues/703 Colleen Bertoni CMPLRLLVM-69572: fixed, other implemented. ETA Oct if cherry-picked 2025-07-30 2025-10-01
60 ext_oneapi_memcpy2d is significantly slower with implicit scaling than explicit and on PVC vs A100 Really depends on GSD-11132 source/reproducers/dpcpp/ext_oneapi_memcpy2d_perf Natalie Beams No response 2025-07-29 2025-10-01
58 kokkos inclusive and exclusive scan giving incorrect answers for 1146.10 CMPLRLLVM-69285, GSD-11736 source/reproducers/dpcpp/kokkos_optimization_scan Daniel Arndt 🚨 Fixed internally, LTS2, 1-2 months (Nov.) 2025-07-23 2025-09-17
57 GPU segfault in gtensor_bench with 2025.2 MKLD-18276, CMPLRLIBS-35326, CMPLRLLVM-68696 source/applications/gtensor_bench Colleen Bertoni 2025.3 2025-07-22 2025-08-11
56 RSBench-SYCL incorrect answers with 1146.10 GSD-11247 source/applications/RSBench/ John Tramm, Colleen Bertoni 1146.31 2025-07-22 2025-09-17
55 Linking in LZ causes changes in signal handling cmplrlibs-35385, GSD-11413 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/signal_handler/ Thomas Applencourt, Colleen Bertoni No response 2025-07-22 2025-07-25
54 oneCCL zeMemGetAddressRange error with alltoallv and zero-sized buffers oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174, MLSL-3764 See instructions on oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174 Riccardo Balin 🚨 oneCCL 2021.17, oneAPI 2025.3 2025-07-18 2025-08-26
52 compiler segfaults linking warpx binary GSD-11357 /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/warpx Tim Williams 🚨 2025.2 + 1146.10 2025-07-07 2025-10-13
48 Zombie Processes GSD-11266 none yet Servesh M 🚨 Should be within the LTS2 release (1146.12) 2025-06-25 2025-08-06
47 Non standard MPI knobs suggested for performance ANL-291 N/A Servesh M No response 2025-06-23 2025-06-27
45 DDT issues since Aurora upgrade No response /lus/flare/projects/catalyst/world_shared/zippy/ddt Tim Williams Linaro Forge 2025.0.1 has the workaround. GDB 2025.3 the root cause will be gone. 2025-06-12 2025-08-05
43 CMake can't find MKL::MKL_SYCL with MPI wrapper compilers No response https://github.com/thilinarmtb/onemkl_cmake_mpi_bug Thilina Ratnayaka, Colleen Bertoni improvements will be part of the next oneMKL release, 2025.3. 2025-06-11 2025-06-25
39 Feature request for Aurora runtime to include debugging symbols ANL-286, HPCS-15374, GSD-11427 feature request Ye Luo No response 2025-05-29 2025-09-17
38 One application in GRID consistently hangs GSD-11441 /lus/flare/projects/Aurora_deployment/xyjin/W/test_grid_g5r5_paboyle Xiao-Yong Jin 🚨 Fixed Internally, later October likely. Potentially in LTS2, 1146.31+ 2025-05-27 2025-10-01
37 xpu-smi reports "N/A" for GPU Utilization RITM0428460, ANL-279, GSD-11252 any run of xpu-smi Kyle Felker / Colleen Bertoni 1146.31 2025-05-22 2025-10-08
36 (Occasional Interruptible) hangs in applications Possibly related to ANL-215 /lus/flare/projects/Aurora_deployment/xyjin/W/test_example_detar.skel Xiao-Yong Jin 🚨 No response 2025-05-15 2025-07-09
33 Crash when calling too many MPI_Probe https://github.com/pmodels/mpich/issues/7427 https://github.com/pmodels/mpich/issues/7427 David--Cléris Timothée No response 2025-05-15 2025-05-15
32 PETSc segfaults in sparse matrix calls IGDB-6516, GSD-10450 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mkl/csr_gemv_usm/ Junchao Zhang 🚨 2025.3 for part malloc_shared in MKL 2025-05-15 2025-06-25
31 GAMESS segfaults with -O0 GSD-10393, CMPLRLIBS-35345,GSD-11035 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/openmp/gamess_O0_page_fault Colleen Bertoni 🚨 1146.31 (Targeted for LTS2 (1146.12+), contained with the IGC 2.16 series / WW34 (2-3 weeks)) 2025-05-14 2025-09-17
30 Copy 2D/3D are broken (zeCommandListAppendMemoryCopyRegion) NEO-14954, GSD-11132 https://github.com/rpereira-dev/ze-zoo Romain PEREIRA and Thomas APPLENCOURT 🚨 No response 2025-05-10 2025-09-17
29 Significant slowdown with LAMMPS in first run, subsequent runs much faster No response /flare/catalyst/proj_shared/knight/projects/ExtremeCarbon/snap-carbon-scaling/1B/ Christopher Knight No response 2025-05-09 2025-08-20
18 Ping failures and hangs with production runs using GPT/GRID ANL-251, RITM0404147, RITM0404148, RITM0405730, GSD-11441 /lus/flare/projects/LatticeFlavor/lehner Xiao-Yong Jin 🚨 No response 2025-04-04 2025-08-18
17 hang with MPI pipelining https://github.com/pmodels/mpich/issues/7373 Build and run commands are in the MPICH issue. James Osborn Should be fixed in top of aurora_test 2025-04-03 2025-08-20
13 XGC hangs at scale CMPLRTST-27836 xgc-es-cpp-gpu app, ES_ITER test case Tim Williams 🚨 No response 2025-04-03 2025-09-17
12 CXI alloc failed on cxi1: request exceeds ACs limits No response None Not Thomas No response 2025-04-01 2025-08-04

Closed Issues

Internal ID Description Vendor ID Reproducer Path PoC Priority? Date Opened Closed Date
75 "MPL_gpu_query_is_same_dev(int, int): Assertion `global_dev1 >= 0 && global_dev1 < known_ze_device_count' failed." with mpich.dbg No response https://github.com/pmodels/mpich/issues/7602 Tim, JaeHyuk, Colleen 2025-09-30 2025-10-13
72 MPI_aborts in many applications in next-eval at larger scales No response N/A Brian Holland / Tim Williams 2025-09-16 2025-09-30
59 [ISHMEM] Unit test fails with ishmem 1.4.0 https://github.com/oneapi-src/ishmem/issues/10 https://github.com/oneapi-src/ishmem/issues/10 and source/applications/ishmem_sos Abhi 2025-07-25 2025-07-31
53 IFX Compiler reads and stores floating point values from a text file at single-precision No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/ifx/fp_precision Victor Anisimov 🚨 2025-07-09 2025-07-10
51 [SYCL] Bug from SYCL peer_access No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/sycl_peer_access Abhi 2025-07-02 2025-10-13
50 OpenMP Thread binding No response See bellow Romain PEREIRA 2025-07-02 2025-07-02
49 [E3SM] MPICH bug related to collectives tunning https://github.com/pmodels/mpich/issues/7456 https://github.com/pmodels/mpich/issues/7456 Abhi 🚨 2025-06-27 2025-10-09
44 QMCPACK segfault in libomp No response Not yet created Ye Luo 🚨 2025-06-12 2025-07-23
42 Linking fails with old build environment No response /lus/flare/projects/PHASTA_aesp_CNDA/jrwrigh/petsc_build_test Kris Rowe 2025-06-06 2025-06-10
41 torch.compile segfaults for >2 tiles MLSL-3728 /flare/Aurora_deployment/vsastry/torch_compile Varuni Sastry 2025-06-06 2025-07-24
40 Need SYSMAN support for all modes in recent releases HPCS-15366, related: GSD-11104 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/leak_zesMemoryGetState Thomas Applencourt 🚨 2025-05-30 2025-06-17
35 Avoid outputs exceeding few KBs to stdout/stderr from MPI ranks RITM0425437 First issue Large MPI writes to stdout Servesh Muralidharan 2025-05-15 2025-07-23
34 Runtime Error: pytorch DDP with CCL_BCAST=<"double_tree, direct, naive, maybe others?"> MLSL-3729 In issue Nathan Nichols 2025-05-15 2025-10-13
28 CMake failures with SYCL No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/ Abhishek Bagusetty 2025-05-09 2025-05-09
27 Build failures on PVC with Cutlass GSD-11099, https://github.com/codeplaysoftware/cutlass-sycl/issues/329 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/cutlass-sycl Abhi 🚨 2025-05-07 2025-10-13
26 L0 memcpy bug GSD-11142, NEO-14641 I was doing the same run as QMCPACK SOW runs in the reframe Ye Luo 🚨 2025-05-06 2025-10-13
25 Compile fail in Lattice App Brian reproduced and confirms fixed in 2025.1 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/bug_cgpt_icpx Xiao-Yong Jin 🚨 2025-05-01 2025-10-13
24 Noticeably more "ping failed" than before the 2025.1 SDK + 1099.12 UMD/KMD upgrade JIRA is:  HPCS-15331 N/A Xiao-Yong Jin Colleen Bertoni 2025-05-01 2025-05-16
23 Apps stop running after Apr 29 upgrade due to libstdc++ dependency No response See details Ye Luo 2025-04-30 2025-05-06
22 SYCL In-order queue broken NEO-14641 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/in-order Thomas Applencourt 🚨 2025-04-23 2025-10-13
21 Error during write with Quantum ESPRESSO No response see .zip file attached below, also /lus/flare/projects/matml_aesp_CNDA/dir_io_QE_crash Filippo Simini 🚨 2025-04-17 2025-04-18
20 Issue with gpu-bind for mpiexec under ZE_FLAT_DEVICE_HIERARCHY=FLAT mode ANL-283/HPE Support Case 5390607860 See below Abhishek, Nathan, Khalid 2025-04-16 2025-10-01
19 Severe CPU memory growth in MPICH No response /flare/catalyst/world_shared/zippy/reproducers/issue19 Tim Williams 2025-04-04 2025-07-31
16 Catastrophic memory error in context lmp_aurora_kokkos No response public LAMMPS Chris Knight 2025-04-03 2025-07-23
9 Multithreaded data-transfer can cause page-fault N/A Full QMCPACK Ye Luo 2025-04-01 2025-05-08
8 Lots of H2D copies produce CPU I9 error and incorrect value N/A Full QMCPACK Ye Luo 🚨 2025-04-01 2025-05-28
7 MPI_Bcast gets faster when turning off XPMEM pmodels/mpich#7334 see Issue on MPICH GitHub repo Ye Luo 2025-04-01 2025-04-24
6 MPICH memory allocation slows down at scale pmodels/mpich#7333 see MPICH issue Ye Luo 🚨 2025-04-01 2025-04-24
4 Incorrect results in receive buffer in GPU memory MPICH 7312 grid application (lattice QCD) Patrick Steinbrecher, Tim Williams 🚨 2025-03-25 2025-04-24
3 Linker error found by XGC CMPLRLLVM-66496 /home/zippy/smalltests/aurora/xgc42/fails Tim Williams 2025-03-19 2025-03-28

Update tables

Automatically updated nightly. To update now, wait 10-15s after last change to AuroraBugTracking Issues, then run (anywhere on a machine that has authenticated with gh):

gh workflow run "Update Submodules" --repo "argonne-lcf/user-guides" && GH_FORCE_TTY=100% watch -c -n1 gh run list --repo "argonne-lcf/user-guides"
And wait ~2m until no jobs are running.

Or execute aurora-bug-table-sync.sh to automatically run everything step-by-step and know exactly when the changes are live online.