< Back to Aurora Known Issues page
Open Issues¶
Internal ID | Description | Vendor ID | Reproducer Path | PoC | Priority? | ETA | Date Opened | Last Updated |
---|---|---|---|---|---|---|---|---|
80 | VTune fails with "Assertion failed: tool_gtpin_support:126: (buffer) " | No response | /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/Apps/GAMESS_RI-MP2_MiniApp source/reproducers/tools/vtune_gtpin_fail in the test set | JaeHyuk Kwack | 🚨 | 2025.3 | 2025-10-10 | 2025-10-13 |
79 | Advisor fail with "advisor: Warning: The application returned a non-zero exit value." | ADV-10687 | source/reproducers/tools/advisor_gflop | JaeHyuk Kwack | Fixed with advisor --version == 616302 , which should be in 2025.3 | 2025-10-08 | 2025-10-10 | |
78 | ExchCXX fails to compile with is too large for Clang to process | CMPLRLLVM-70962 | source/reproducers/dpcpp/jit_too_large_for_Clang | Abhi | No response | 2025-10-06 | 2025-10-07 | |
77 | [SYCL] Function pointers compilation issue | CMPLRLLVM-16317 | Reproducer below | Abhi, Patrick Steinbrecher | 🚨 | No response | 2025-10-06 | 2025-10-06 |
76 | Segfaults in MPICH routines in next-eval | No response | for XGC: /lus/flare/projects/catalyst/world_shared/zippy/xgc | Tim Williams | 🚨 | No response | 2025-10-01 | 2025-10-01 |
74 | ZES_ENABLE_SYSMAN should default to 1 in the oneapi module | No response | see Details | Tim Williams | No response | 2025-09-29 | 2025-10-01 | |
73 | "error: undefined reference to `old_llvm.umul.with.overflow.i64'" in newer kokkos | CMPLRLLVM-70603 | source/reproducers/dpcpp/kokkos_mdspan_umul | Daniel Arndt | Being worked internally | 2025-09-17 | 2025-10-07 | |
71 | RPC launch error tracking | 2025-09-15 | 2025-09-23 | |||||
70 | PALS gpu-bind, composite, envall lead to "launch failed" | DCE Case 5392152905 | applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/envall | Thomas Applencourt | patch to test out and that should already be landed for a future release | 2025-09-10 | 2025-10-02 | |
68 | warpx segfaults/hangs with OpenPMD enabled | No response | /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ | Tim Williams | No response | 2025-08-23 | 2025-08-23 | |
67 | warpx Debug build crashes oneAPI compiler | CMPLRLLVM-24314 | /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ | Tim Williams | No response | 2025-08-21 | 2025-09-03 | |
66 | Compiling with "-g" leads to a much larger binary than without | CMPLRLLVM-69909, CMPLRLLVM-24314 | lammps + -g | Brian Holland | No response | 2025-08-20 | 2025-10-01 | |
65 | Clarification requested about ZE_DEVICE_PROPERTY_FLAG_ONDEMANDPAGING on PVC | GSD-11510 | source/reproducers/l0/ondemand_paging/ | Colleen | No response | 2025-08-20 | 2025-09-17 | |
64 | E3SM fortran compile ICE | CMPLRLLVM-69862 | source/reproducers/ifx/e3sm_homme_ICE_error | Abhi | 2025.3.0 | 2025-08-18 | 2025-10-09 | |
63 | Kokkos kernels fails to build with kokkos built with openmp enabled | CMPLRLLVM-69908 | source/applications/kokkos-kernels | Sean Koyama / Colleen Bertoni | gone starting with 4.19 (fixed in 2025.3 branch) | 2025-08-18 | 2025-09-16 | |
62 | -ftarget-register-alloc-mode=pvc:large and "-device 12.60.7" for AOT | GSD-11490 | source/reproducers/general/ftarget-register-alloc-mode_flag | Steve Rangel | Possible fix internally | 2025-08-14 | 2025-09-17 | |
61 | Failing unit tests on PVCs with 2025.2 oneAPI SDK -- is it expected? | https://github.com/uxlfoundation/oneMath/issues/703, CMPLRLLVM-69572, ONSAM-1930, GSD-11482 | https://github.com/uxlfoundation/oneMath/issues/703 | Colleen Bertoni | CMPLRLLVM-69572: fixed, other implemented. ETA Oct if cherry-picked | 2025-07-30 | 2025-10-01 | |
60 | ext_oneapi_memcpy2d is significantly slower with implicit scaling than explicit and on PVC vs A100 | Really depends on GSD-11132 | source/reproducers/dpcpp/ext_oneapi_memcpy2d_perf | Natalie Beams | No response | 2025-07-29 | 2025-10-01 | |
58 | kokkos inclusive and exclusive scan giving incorrect answers for 1146.10 | CMPLRLLVM-69285, GSD-11736 | source/reproducers/dpcpp/kokkos_optimization_scan | Daniel Arndt | 🚨 | Fixed internally, LTS2, 1-2 months (Nov.) | 2025-07-23 | 2025-09-17 |
57 | GPU segfault in gtensor_bench with 2025.2 | MKLD-18276, CMPLRLIBS-35326, CMPLRLLVM-68696 | source/applications/gtensor_bench | Colleen Bertoni | 2025.3 | 2025-07-22 | 2025-08-11 | |
56 | RSBench-SYCL incorrect answers with 1146.10 | GSD-11247 | source/applications/RSBench/ | John Tramm, Colleen Bertoni | 1146.31 | 2025-07-22 | 2025-09-17 | |
55 | Linking in LZ causes changes in signal handling | cmplrlibs-35385, GSD-11413 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/signal_handler/ | Thomas Applencourt, Colleen Bertoni | No response | 2025-07-22 | 2025-07-25 | |
54 | oneCCL zeMemGetAddressRange error with alltoallv and zero-sized buffers | oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174, MLSL-3764 | See instructions on oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174 | Riccardo Balin | 🚨 | oneCCL 2021.17, oneAPI 2025.3 | 2025-07-18 | 2025-08-26 |
52 | compiler segfaults linking warpx binary | GSD-11357 | /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/warpx | Tim Williams | 🚨 | 2025.2 + 1146.10 | 2025-07-07 | 2025-10-13 |
48 | Zombie Processes | GSD-11266 | none yet | Servesh M | 🚨 | Should be within the LTS2 release (1146.12) | 2025-06-25 | 2025-08-06 |
47 | Non standard MPI knobs suggested for performance | ANL-291 | N/A | Servesh M | No response | 2025-06-23 | 2025-06-27 | |
45 | DDT issues since Aurora upgrade | No response | /lus/flare/projects/catalyst/world_shared/zippy/ddt | Tim Williams | Linaro Forge 2025.0.1 has the workaround. GDB 2025.3 the root cause will be gone. | 2025-06-12 | 2025-08-05 | |
43 | CMake can't find MKL::MKL_SYCL with MPI wrapper compilers | No response | https://github.com/thilinarmtb/onemkl_cmake_mpi_bug | Thilina Ratnayaka, Colleen Bertoni | improvements will be part of the next oneMKL release, 2025.3. | 2025-06-11 | 2025-06-25 | |
39 | Feature request for Aurora runtime to include debugging symbols | ANL-286, HPCS-15374, GSD-11427 | feature request | Ye Luo | No response | 2025-05-29 | 2025-09-17 | |
38 | One application in GRID consistently hangs | GSD-11441 | /lus/flare/projects/Aurora_deployment/xyjin/W/test_grid_g5r5_paboyle | Xiao-Yong Jin | 🚨 | Fixed Internally, later October likely. Potentially in LTS2, 1146.31+ | 2025-05-27 | 2025-10-01 |
37 | xpu-smi reports "N/A" for GPU Utilization | RITM0428460, ANL-279, GSD-11252 | any run of xpu-smi | Kyle Felker / Colleen Bertoni | 1146.31 | 2025-05-22 | 2025-10-08 | |
36 | (Occasional Interruptible) hangs in applications | Possibly related to ANL-215 | /lus/flare/projects/Aurora_deployment/xyjin/W/test_example_detar.skel | Xiao-Yong Jin | 🚨 | No response | 2025-05-15 | 2025-07-09 |
33 | Crash when calling too many MPI_Probe | https://github.com/pmodels/mpich/issues/7427 | https://github.com/pmodels/mpich/issues/7427 | David--Cléris Timothée | No response | 2025-05-15 | 2025-05-15 | |
32 | PETSc segfaults in sparse matrix calls | IGDB-6516, GSD-10450 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mkl/csr_gemv_usm/ | Junchao Zhang | 🚨 | 2025.3 for part malloc_shared in MKL | 2025-05-15 | 2025-06-25 |
31 | GAMESS segfaults with -O0 | GSD-10393, CMPLRLIBS-35345,GSD-11035 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/openmp/gamess_O0_page_fault | Colleen Bertoni | 🚨 | 1146.31 (Targeted for LTS2 (1146.12+), contained with the IGC 2.16 series / WW34 (2-3 weeks)) | 2025-05-14 | 2025-09-17 |
30 | Copy 2D/3D are broken (zeCommandListAppendMemoryCopyRegion) | NEO-14954, GSD-11132 | https://github.com/rpereira-dev/ze-zoo | Romain PEREIRA and Thomas APPLENCOURT | 🚨 | No response | 2025-05-10 | 2025-09-17 |
29 | Significant slowdown with LAMMPS in first run, subsequent runs much faster | No response | /flare/catalyst/proj_shared/knight/projects/ExtremeCarbon/snap-carbon-scaling/1B/ | Christopher Knight | No response | 2025-05-09 | 2025-08-20 | |
18 | Ping failures and hangs with production runs using GPT/GRID | ANL-251, RITM0404147, RITM0404148, RITM0405730, GSD-11441 | /lus/flare/projects/LatticeFlavor/lehner | Xiao-Yong Jin | 🚨 | No response | 2025-04-04 | 2025-08-18 |
17 | hang with MPI pipelining | https://github.com/pmodels/mpich/issues/7373 | Build and run commands are in the MPICH issue. | James Osborn | Should be fixed in top of aurora_test | 2025-04-03 | 2025-08-20 | |
13 | XGC hangs at scale | CMPLRTST-27836 | xgc-es-cpp-gpu app, ES_ITER test case | Tim Williams | 🚨 | No response | 2025-04-03 | 2025-09-17 |
12 | CXI alloc failed on cxi1: request exceeds ACs limits | No response | None | Not Thomas | No response | 2025-04-01 | 2025-08-04 |
Closed Issues¶
Internal ID | Description | Vendor ID | Reproducer Path | PoC | Priority? | Date Opened | Closed Date |
---|---|---|---|---|---|---|---|
75 | "MPL_gpu_query_is_same_dev(int, int): Assertion `global_dev1 >= 0 && global_dev1 < known_ze_device_count' failed." with mpich.dbg | No response | https://github.com/pmodels/mpich/issues/7602 | Tim, JaeHyuk, Colleen | 2025-09-30 | 2025-10-13 | |
72 | MPI_aborts in many applications in next-eval at larger scales | No response | N/A | Brian Holland / Tim Williams | 2025-09-16 | 2025-09-30 | |
59 | [ISHMEM] Unit test fails with ishmem 1.4.0 | https://github.com/oneapi-src/ishmem/issues/10 | https://github.com/oneapi-src/ishmem/issues/10 and source/applications/ishmem_sos | Abhi | 2025-07-25 | 2025-07-31 | |
53 | IFX Compiler reads and stores floating point values from a text file at single-precision | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/ifx/fp_precision | Victor Anisimov | 🚨 | 2025-07-09 | 2025-07-10 |
51 | [SYCL] Bug from SYCL peer_access | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/sycl_peer_access | Abhi | 2025-07-02 | 2025-10-13 | |
50 | OpenMP Thread binding | No response | See bellow | Romain PEREIRA | 2025-07-02 | 2025-07-02 | |
49 | [E3SM] MPICH bug related to collectives tunning | https://github.com/pmodels/mpich/issues/7456 | https://github.com/pmodels/mpich/issues/7456 | Abhi | 🚨 | 2025-06-27 | 2025-10-09 |
44 | QMCPACK segfault in libomp | No response | Not yet created | Ye Luo | 🚨 | 2025-06-12 | 2025-07-23 |
42 | Linking fails with old build environment | No response | /lus/flare/projects/PHASTA_aesp_CNDA/jrwrigh/petsc_build_test | Kris Rowe | 2025-06-06 | 2025-06-10 | |
41 | torch.compile segfaults for >2 tiles | MLSL-3728 | /flare/Aurora_deployment/vsastry/torch_compile | Varuni Sastry | 2025-06-06 | 2025-07-24 | |
40 | Need SYSMAN support for all modes in recent releases | HPCS-15366, related: GSD-11104 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/leak_zesMemoryGetState | Thomas Applencourt | 🚨 | 2025-05-30 | 2025-06-17 |
35 | Avoid outputs exceeding few KBs to stdout/stderr from MPI ranks | RITM0425437 First issue | Large MPI writes to stdout | Servesh Muralidharan | 2025-05-15 | 2025-07-23 | |
34 | Runtime Error: pytorch DDP with CCL_BCAST=<"double_tree, direct, naive, maybe others?"> | MLSL-3729 | In issue | Nathan Nichols | 2025-05-15 | 2025-10-13 | |
28 | CMake failures with SYCL | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/ | Abhishek Bagusetty | 2025-05-09 | 2025-05-09 | |
27 | Build failures on PVC with Cutlass | GSD-11099, https://github.com/codeplaysoftware/cutlass-sycl/issues/329 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/cutlass-sycl | Abhi | 🚨 | 2025-05-07 | 2025-10-13 |
26 | L0 memcpy bug | GSD-11142, NEO-14641 | I was doing the same run as QMCPACK SOW runs in the reframe | Ye Luo | 🚨 | 2025-05-06 | 2025-10-13 |
25 | Compile fail in Lattice App | Brian reproduced and confirms fixed in 2025.1 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/bug_cgpt_icpx | Xiao-Yong Jin | 🚨 | 2025-05-01 | 2025-10-13 |
24 | Noticeably more "ping failed" than before the 2025.1 SDK + 1099.12 UMD/KMD upgrade | JIRA is: HPCS-15331 | N/A | Xiao-Yong Jin Colleen Bertoni | 2025-05-01 | 2025-05-16 | |
23 | Apps stop running after Apr 29 upgrade due to libstdc++ dependency | No response | See details | Ye Luo | 2025-04-30 | 2025-05-06 | |
22 | SYCL In-order queue broken | NEO-14641 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/in-order | Thomas Applencourt | 🚨 | 2025-04-23 | 2025-10-13 |
21 | Error during write with Quantum ESPRESSO | No response | see .zip file attached below, also /lus/flare/projects/matml_aesp_CNDA/dir_io_QE_crash | Filippo Simini | 🚨 | 2025-04-17 | 2025-04-18 |
20 | Issue with gpu-bind for mpiexec under ZE_FLAT_DEVICE_HIERARCHY=FLAT mode | ANL-283/HPE Support Case 5390607860 | See below | Abhishek, Nathan, Khalid | 2025-04-16 | 2025-10-01 | |
19 | Severe CPU memory growth in MPICH | No response | /flare/catalyst/world_shared/zippy/reproducers/issue19 | Tim Williams | 2025-04-04 | 2025-07-31 | |
16 | Catastrophic memory error in context lmp_aurora_kokkos | No response | public LAMMPS | Chris Knight | 2025-04-03 | 2025-07-23 | |
9 | Multithreaded data-transfer can cause page-fault | N/A | Full QMCPACK | Ye Luo | 2025-04-01 | 2025-05-08 | |
8 | Lots of H2D copies produce CPU I9 error and incorrect value | N/A | Full QMCPACK | Ye Luo | 🚨 | 2025-04-01 | 2025-05-28 |
7 | MPI_Bcast gets faster when turning off XPMEM | pmodels/mpich#7334 | see Issue on MPICH GitHub repo | Ye Luo | 2025-04-01 | 2025-04-24 | |
6 | MPICH memory allocation slows down at scale | pmodels/mpich#7333 | see MPICH issue | Ye Luo | 🚨 | 2025-04-01 | 2025-04-24 |
4 | Incorrect results in receive buffer in GPU memory | MPICH 7312 | grid application (lattice QCD) | Patrick Steinbrecher, Tim Williams | 🚨 | 2025-03-25 | 2025-04-24 |
3 | Linker error found by XGC | CMPLRLLVM-66496 | /home/zippy/smalltests/aurora/xgc42/fails | Tim Williams | 2025-03-19 | 2025-03-28 |
Update tables¶
Automatically updated nightly. To update now, wait 10-15s after last change to AuroraBugTracking Issues, then run (anywhere on a machine that has authenticated with gh
):
Or execute aurora-bug-table-sync.sh to automatically run everything step-by-step and know exactly when the changes are live online.