< Back to Aurora Known Issues page
Open Issues¶
| Internal ID | Description | Vendor ID | Reproducer Path | PoC | Priority? | ETA | Date Opened | Last Updated |
|---|---|---|---|---|---|---|---|---|
| 95 | Memory leak in Libfabric | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/cxi_memory_lead | Rob Lathan | No response | 2025-11-13 | 2025-11-13 | |
| 94 | zeMemFree slowdown in a loop | GSD-11962 | source/reproducers/l0/zememfree_slowdown/ | Colleen | No response | 2025-11-08 | 2025-11-12 | |
| 93 | oneCCL exeption with PyTorch DTensor: SYCL recv is not supported for multi-node case | MLSL-3951 | In the text body | Väinö Hatanpää | Assigned | 2025-11-05 | 2025-11-12 | |
| 92 | SYCL device info free_memory wrong on 2-stack PVC1550 GPU | CMPLRLLVM-71510, URLZA-691 | source/reproducers/dpcpp/sycl_free_flat | Jakub H | Working on it | 2025-10-31 | 2025-11-12 | |
| 91 | sycl failed malloc_device on GPU takes 20 seconds | GSD-10587 | source/reproducers/dpcpp/slow_alloc/ | Jakub H | fixed internally/ pending verification, end of dec. / early january | 2025-10-31 | 2025-11-12 | |
| 90 | Device Sanitizer + LIBOMPTARGET_DEBUG=1 issues for the GAMESS RI-MP2 mini-app | CMPLRLLVM-71455 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/tools/sanitizer_rimp2_test | Brian | Fixed internally, 2026.0 (next year) | 2025-10-31 | 2025-11-12 | |
| 89 | Device Sanitizer breaks with MKL DGEMM call in GAMESS RI-MP2 mini-app | MKLD-19334 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/tools/sanitizer_rimp2_test | Brian, JaeHyuk | Fixed internally, pending verification, 2026.0 | 2025-10-31 | 2025-11-12 | |
| 88 | RPATH issue when mixing and matching SDK and spack packages built by another SDK | No response | No need. reprdducer attached in this ticket | Ye Luo | No response | 2025-10-30 | 2025-10-30 | |
| 87 | QUDA compile fail | cmplrllvm-70981 | source/reproducers/openmp/quda_crash | Xiayong Jin / Brian W | In progress | 2025-10-28 | 2025-10-29 | |
| 86 | omp_alloc should support pinned memory, or implement proper fallback behavior | CMPLRLIBS-35442 | /home/kweide/projects/OpenMP_VV/tests/5.1/allocate/test_omp_alloctrait_pinned.c and source/reproducers/openmp/omp_alloctrait_pinned in the test set | Klaus Weide | In progress | 2025-10-28 | 2025-11-12 | |
| 85 | zeEventQueryKernelTimestampsExt is broken with IMM command lists | GSD-11124 | source/reproducers/l0/zeEventQueryKernelTimestampsExt_clock | Thomas/John Mellor-Crummey | In progress | 2025-10-27 | 2025-11-12 | |
| 84 | Device Sanitizer is not functional with OpenMP C/Fortran codes | /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/sanitizer and source/reproducers/tools/sanitizer and source/reproducers/tools/sanitizer_rimp2_test in the test set | JaeHyuk Kwack | 🚨 | 2025.3 | 2025-10-22 | 2025-10-31 | |
| 83 | With ifx, openmp_version is missing from omp_lib | CMPLRLIBS-35365 | /home/kweide/tests/test_openmp_version.f90 and source/reproducers/openmp/omp_version in the test set | Klaus Weide | 2025.3 | 2025-10-20 | 2025-10-24 | |
| 82 | Symbol missing issue with 1.3 version onwards in SLES and Intel Datacenter Max GPU on Aurora | https://github.com/intel/xpumanager/issues/113 | https://github.com/intel/xpumanager/issues/113 | Servesh | In progress. ETA drop after next (github fix in a few weeks, LTS end of year) | 2025-10-16 | 2025-11-12 | |
| 81 | IGC_StackOverflowDetection not working | GSD-11763 | source/reproducers/openmp/stack_overflow_not_working | Brian | In progress | 2025-10-15 | 2025-10-29 | |
| 80 | VTune fails with "Assertion failed: tool_gtpin_support:126: (buffer) " | VASP-32612, GTPIN-1169 | /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/Apps/GAMESS_RI-MP2_MiniApp source/reproducers/tools/vtune_gtpin_fail in the test set | JaeHyuk Kwack | 🚨 | 2025.3 | 2025-10-10 | 2025-10-30 |
| 79 | Advisor fail with "advisor: Warning: The application returned a non-zero exit value." | ADV-10687 | source/reproducers/tools/advisor_gflop | JaeHyuk Kwack | Fixed with advisor --version == 616302, which should be in 2025.3 | 2025-10-08 | 2025-10-30 | |
| 78 | ExchCXX fails to compile with is too large for Clang to process | CMPLRLLVM-70962 | source/reproducers/dpcpp/jit_too_large_for_Clang | Abhi | 🚨 | 2026.0 ("-Xclang -fignore-overflow-error" by default for SYCL) | 2025-10-06 | 2025-11-14 |
| 77 | [SYCL] Function pointers compilation issue | CMPLRLLVM-16317 | Reproducer below and source/reproducers/dpcpp/func_pointers | Abhi, Patrick Steinbrecher | 🚨 | Under discussion | 2025-10-06 | 2025-10-15 |
| 76 | Segfaults in MPICH routines in next-eval | No response | for XGC: /lus/flare/projects/catalyst/world_shared/zippy/xgc | Tim Williams | 🚨 | No response | 2025-10-01 | 2025-10-01 |
| 74 | ZES_ENABLE_SYSMAN should default to 1 in the oneapi module | No response | see Details | Tim Williams | No response | 2025-09-29 | 2025-10-15 | |
| 73 | "error: undefined reference to `old_llvm.umul.with.overflow.i64'" in newer kokkos | CMPLRLLVM-70603 | source/reproducers/dpcpp/kokkos_mdspan_umul | Daniel Arndt | Being worked internally | 2025-09-17 | 2025-10-14 | |
| 71 | RPC launch error tracking | 2025-09-15 | 2025-09-23 | |||||
| 70 | PALS gpu-bind, composite, envall lead to "launch failed" | DCE Case 5392152905 | applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/envall | Thomas Applencourt | patch to test out and that should already be landed for a future release | 2025-09-10 | 2025-10-02 | |
| 68 | warpx segfaults/hangs with OpenPMD enabled | No response | /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ | Tim Williams | No response | 2025-08-23 | 2025-08-23 | |
| 67 | warpx Debug build crashes oneAPI compiler | CMPLRLLVM-24314 | /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ | Tim Williams | No response | 2025-08-21 | 2025-10-29 | |
| 66 | Compiling with "-g" leads to a much larger binary than without | CMPLRLLVM-69909, CMPLRLLVM-24314 | lammps + -g | Brian Holland | No response | 2025-08-20 | 2025-10-01 | |
| 65 | Clarification requested about ZE_DEVICE_PROPERTY_FLAG_ONDEMANDPAGING on PVC | GSD-11510 | source/reproducers/l0/ondemand_paging/ | Colleen | implemented, ETA end of year. | 2025-08-20 | 2025-10-29 | |
| 64 | E3SM fortran compile ICE | CMPLRLLVM-69862 | source/reproducers/ifx/e3sm_homme_ICE_error | Abhi | 2025.3.0 | 2025-08-18 | 2025-10-09 | |
| 63 | Kokkos kernels fails to build with kokkos built with openmp enabled | CMPLRLLVM-69908 | source/applications/kokkos-kernels | Sean Koyama / Colleen Bertoni | gone starting with 4.19 (fixed in 2025.3 branch) | 2025-08-18 | 2025-09-16 | |
| 62 | -ftarget-register-alloc-mode=pvc:large and "-device 12.60.7" for AOT | GSD-11490 | source/reproducers/general/ftarget-register-alloc-mode_flag | Steve Rangel | Fixed internally, ETA end of year | 2025-08-14 | 2025-10-29 | |
| 60 | ext_oneapi_memcpy2d is significantly slower with implicit scaling than explicit and on PVC vs A100 | Really depends on GSD-11132 | source/reproducers/dpcpp/ext_oneapi_memcpy2d_perf | Natalie Beams | No response | 2025-07-29 | 2025-10-01 | |
| 58 | kokkos inclusive and exclusive scan giving incorrect answers for 1146.10 | CMPLRLLVM-69285, GSD-11736 | source/reproducers/dpcpp/kokkos_optimization_scan | Daniel Arndt | 🚨 | 1146.39 (two weeks out -- end of Nov) | 2025-07-23 | 2025-11-12 |
| 57 | GPU segfault in gtensor_bench with 2025.2 | MKLD-18276, CMPLRLIBS-35326, CMPLRLLVM-68696 | source/applications/gtensor_bench | Colleen Bertoni | 2025.3 | 2025-07-22 | 2025-08-11 | |
| 56 | RSBench-SYCL incorrect answers with 1146.10 | GSD-11247 | source/applications/RSBench/ | John Tramm, Colleen Bertoni | 1146.31 | 2025-07-22 | 2025-09-17 | |
| 55 | Linking in LZ causes changes in signal handling | cmplrlibs-35385, GSD-11413 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/signal_handler/ | Thomas Applencourt, Colleen Bertoni | No response | 2025-07-22 | 2025-07-25 | |
| 54 | oneCCL zeMemGetAddressRange error with alltoallv and zero-sized buffers | oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174, MLSL-3764 | See instructions on oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174 | Riccardo Balin | 🚨 | oneCCL 2021.17, oneAPI 2025.3 | 2025-07-18 | 2025-08-26 |
| 52 | compiler segfaults linking warpx binary | GSD-11357, GSD-11855 | /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/warpx | Tim Williams | 🚨 | 2025.2 + 1146.10 | 2025-07-07 | 2025-11-12 |
| 47 | Non standard MPI knobs suggested for performance | ANL-291 | N/A | Servesh M | No response | 2025-06-23 | 2025-06-27 | |
| 43 | CMake can't find MKL::MKL_SYCL with MPI wrapper compilers | No response | https://github.com/thilinarmtb/onemkl_cmake_mpi_bug | Thilina Ratnayaka, Colleen Bertoni | improvements will be part of the next oneMKL release, 2025.3. | 2025-06-11 | 2025-06-25 | |
| 39 | Feature request for Aurora runtime to include debugging symbols | ANL-286, HPCS-15374, GSD-11427 | feature request | Ye Luo | No response | 2025-05-29 | 2025-09-17 | |
| 38 | One application in GRID consistently hangs | GSD-11441 | /lus/flare/projects/Aurora_deployment/xyjin/W/test_grid_g5r5_paboyle | Xiao-Yong Jin | 🚨 | Fixed Internally, likely end of Nov. 1146.39 | 2025-05-27 | 2025-11-12 |
| 36 | (Occasional Interruptible) hangs in applications | Possibly related to ANL-215 | /lus/flare/projects/Aurora_deployment/xyjin/W/test_example_detar.skel | Xiao-Yong Jin | 🚨 | No response | 2025-05-15 | 2025-07-09 |
| 33 | Crash when calling too many MPI_Probe | https://github.com/pmodels/mpich/issues/7427 | https://github.com/pmodels/mpich/issues/7427 | David--Cléris Timothée | No response | 2025-05-15 | 2025-05-15 | |
| 32 | PETSc segfaults in sparse matrix calls | IGDB-6516, GSD-10450 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mkl/csr_gemv_usm/ | Junchao Zhang | 🚨 | 2025.3 for part malloc_shared in MKL | 2025-05-15 | 2025-06-25 |
| 31 | GAMESS segfaults with -O0 | GSD-10393, CMPLRLIBS-35345,GSD-11035 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/openmp/gamess_O0_page_fault | Colleen Bertoni | 🚨 | 1146.31 (Targeted for LTS2 (1146.12+), contained with the IGC 2.16 series / WW34 (2-3 weeks)) | 2025-05-14 | 2025-09-17 |
| 30 | Copy 2D/3D are broken (zeCommandListAppendMemoryCopyRegion) | NEO-14954, GSD-11132 | https://github.com/rpereira-dev/ze-zoo | Romain PEREIRA and Thomas APPLENCOURT | 🚨 | No response | 2025-05-10 | 2025-09-17 |
| 29 | Significant slowdown with LAMMPS in first run, subsequent runs much faster | No response | /flare/catalyst/proj_shared/knight/projects/ExtremeCarbon/snap-carbon-scaling/1B/ | Christopher Knight | No response | 2025-05-09 | 2025-08-20 | |
| 18 | Ping failures and hangs with production runs using GPT/GRID | ANL-251, RITM0404147, RITM0404148, RITM0405730, GSD-11441 | /lus/flare/projects/LatticeFlavor/lehner | Xiao-Yong Jin | 🚨 | No response | 2025-04-04 | 2025-08-18 |
| 17 | hang with MPI pipelining | https://github.com/pmodels/mpich/issues/7373 | Build and run commands are in the MPICH issue. | James Osborn | Merged in https://github.com/pmodels/mpich/pull/7622 | 2025-04-03 | 2025-10-14 | |
| 13 | XGC hangs at scale | CMPLRTST-27836 | xgc-es-cpp-gpu app, ES_ITER test case | Tim Williams | 🚨 | No response | 2025-04-03 | 2025-09-17 |
| 12 | CXI alloc failed on cxi1: request exceeds ACs limits | No response | None | Not Thomas | No response | 2025-04-01 | 2025-08-04 |
Closed Issues¶
| Internal ID | Description | Vendor ID | Reproducer Path | PoC | Priority? | Date Opened | Closed Date |
|---|---|---|---|---|---|---|---|
| 75 | "MPL_gpu_query_is_same_dev(int, int): Assertion `global_dev1 >= 0 && global_dev1 < known_ze_device_count' failed." with mpich.dbg | No response | https://github.com/pmodels/mpich/issues/7602 | Tim, JaeHyuk, Colleen | 2025-09-30 | 2025-10-13 | |
| 72 | MPI_aborts in many applications in next-eval at larger scales | No response | N/A | Brian Holland / Tim Williams | 2025-09-16 | 2025-09-30 | |
| 61 | Failing unit tests on PVCs with 2025.2 oneAPI SDK -- is it expected? | https://github.com/uxlfoundation/oneMath/issues/703, CMPLRLLVM-69572, ONSAM-1930, GSD-11482 | https://github.com/uxlfoundation/oneMath/issues/703 | Colleen Bertoni | 2025-07-30 | 2025-11-12 | |
| 59 | [ISHMEM] Unit test fails with ishmem 1.4.0 | https://github.com/oneapi-src/ishmem/issues/10 | https://github.com/oneapi-src/ishmem/issues/10 and source/applications/ishmem_sos | Abhi | 2025-07-25 | 2025-07-31 | |
| 53 | IFX Compiler reads and stores floating point values from a text file at single-precision | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/ifx/fp_precision | Victor Anisimov | 🚨 | 2025-07-09 | 2025-07-10 |
| 51 | [SYCL] Bug from SYCL peer_access | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/sycl_peer_access | Abhi | 2025-07-02 | 2025-10-13 | |
| 50 | OpenMP Thread binding | No response | See bellow | Romain PEREIRA | 2025-07-02 | 2025-07-02 | |
| 49 | [E3SM] MPICH bug related to collectives tunning | https://github.com/pmodels/mpich/issues/7456 | https://github.com/pmodels/mpich/issues/7456 | Abhi | 🚨 | 2025-06-27 | 2025-10-09 |
| 48 | Zombie Processes | GSD-11266 | none yet | Servesh M | 🚨 | 2025-06-25 | 2025-10-29 |
| 45 | DDT issues since Aurora upgrade | No response | /lus/flare/projects/catalyst/world_shared/zippy/ddt | Tim Williams | 2025-06-12 | 2025-11-03 | |
| 44 | QMCPACK segfault in libomp | No response | Not yet created | Ye Luo | 🚨 | 2025-06-12 | 2025-07-23 |
| 42 | Linking fails with old build environment | No response | /lus/flare/projects/PHASTA_aesp_CNDA/jrwrigh/petsc_build_test | Kris Rowe | 2025-06-06 | 2025-06-10 | |
| 41 | torch.compile segfaults for >2 tiles | MLSL-3728 | /flare/Aurora_deployment/vsastry/torch_compile | Varuni Sastry | 2025-06-06 | 2025-07-24 | |
| 40 | Need SYSMAN support for all modes in recent releases | HPCS-15366, related: GSD-11104 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/leak_zesMemoryGetState | Thomas Applencourt | 🚨 | 2025-05-30 | 2025-06-17 |
| 37 | xpu-smi reports "N/A" for GPU Utilization | RITM0428460, ANL-279, GSD-11252 | any run of xpu-smi | Kyle Felker / Colleen Bertoni | 2025-05-22 | 2025-10-29 | |
| 35 | Avoid outputs exceeding few KBs to stdout/stderr from MPI ranks | RITM0425437 First issue | Large MPI writes to stdout | Servesh Muralidharan | 2025-05-15 | 2025-07-23 | |
| 34 | Runtime Error: pytorch DDP with CCL_BCAST=<"double_tree, direct, naive, maybe others?"> | MLSL-3729 | In issue | Nathan Nichols | 2025-05-15 | 2025-10-13 | |
| 28 | CMake failures with SYCL | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/ | Abhishek Bagusetty | 2025-05-09 | 2025-05-09 | |
| 27 | Build failures on PVC with Cutlass | GSD-11099, https://github.com/codeplaysoftware/cutlass-sycl/issues/329 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/cutlass-sycl | Abhi | 🚨 | 2025-05-07 | 2025-10-13 |
| 26 | L0 memcpy bug | GSD-11142, NEO-14641 | I was doing the same run as QMCPACK SOW runs in the reframe | Ye Luo | 🚨 | 2025-05-06 | 2025-10-13 |
| 25 | Compile fail in Lattice App | Brian reproduced and confirms fixed in 2025.1 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/bug_cgpt_icpx | Xiao-Yong Jin | 🚨 | 2025-05-01 | 2025-10-13 |
| 24 | Noticeably more "ping failed" than before the 2025.1 SDK + 1099.12 UMD/KMD upgrade | JIRA is: HPCS-15331 | N/A | Xiao-Yong Jin Colleen Bertoni | 2025-05-01 | 2025-05-16 | |
| 23 | Apps stop running after Apr 29 upgrade due to libstdc++ dependency | No response | See details | Ye Luo | 2025-04-30 | 2025-05-06 | |
| 22 | SYCL In-order queue broken | NEO-14641 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/in-order | Thomas Applencourt | 🚨 | 2025-04-23 | 2025-10-13 |
| 21 | Error during write with Quantum ESPRESSO | No response | see .zip file attached below, also /lus/flare/projects/matml_aesp_CNDA/dir_io_QE_crash | Filippo Simini | 🚨 | 2025-04-17 | 2025-04-18 |
| 20 | Issue with gpu-bind for mpiexec under ZE_FLAT_DEVICE_HIERARCHY=FLAT mode | ANL-283/HPE Support Case 5390607860 | See below | Abhishek, Nathan, Khalid | 2025-04-16 | 2025-10-01 | |
| 19 | Severe CPU memory growth in MPICH | No response | /flare/catalyst/world_shared/zippy/reproducers/issue19 | Tim Williams | 2025-04-04 | 2025-07-31 | |
| 16 | Catastrophic memory error in context lmp_aurora_kokkos | No response | public LAMMPS | Chris Knight | 2025-04-03 | 2025-07-23 | |
| 9 | Multithreaded data-transfer can cause page-fault | N/A | Full QMCPACK | Ye Luo | 2025-04-01 | 2025-05-08 | |
| 8 | Lots of H2D copies produce CPU I9 error and incorrect value | N/A | Full QMCPACK | Ye Luo | 🚨 | 2025-04-01 | 2025-05-28 |
| 7 | MPI_Bcast gets faster when turning off XPMEM | pmodels/mpich#7334 | see Issue on MPICH GitHub repo | Ye Luo | 2025-04-01 | 2025-04-24 | |
| 6 | MPICH memory allocation slows down at scale | pmodels/mpich#7333 | see MPICH issue | Ye Luo | 🚨 | 2025-04-01 | 2025-04-24 |
| 4 | Incorrect results in receive buffer in GPU memory | MPICH 7312 | grid application (lattice QCD) | Patrick Steinbrecher, Tim Williams | 🚨 | 2025-03-25 | 2025-04-24 |
| 3 | Linker error found by XGC | CMPLRLLVM-66496 | /home/zippy/smalltests/aurora/xgc42/fails | Tim Williams | 2025-03-19 | 2025-03-28 |
Update tables¶
Automatically updated nightly. To update now, wait 10-15s after last change to AuroraBugTracking Issues, then run (anywhere on a machine that has authenticated with gh):
Or execute aurora-bug-table-sync.sh to automatically run everything step-by-step and know exactly when the changes are live online.