< Back to Aurora Known Issues page
Open Issues¶
| Internal ID | Description | Vendor ID | Reproducer Path | PoC | Priority? | ETA | Date Opened | Last Updated |
|---|---|---|---|---|---|---|---|---|
| 134 | [Advisor] Incorrect lower precision data | ADV-7898 | https://github.com/jkwack/JKBench_Tools/tree/Low_Precision/Low_Precision | JaeHyuk Kwack, Riccardo Balin | 🚨 | No response | 2026-04-13 | 2026-04-14 |
| 132 | [DPC++] Print not working at O0 | CMPLRLLVM-74689 | source/reproducers/dpcpp/print_bug/ | Thomas | No response | 2026-04-07 | 2026-04-07 | |
| 131 | [Runtime] magma test failing | GSD-12314 | source/reproducers/dpcpp/magama_crash | Natalie Beam, Colleen | Under investigation | 2026-03-18 | 2026-03-18 | |
| 130 | [LZ] Timestamp and in-order queues leads to wrong answers | GSD-12468 | source/reproducers/l0/timestamp_wrong_answer | Colleen | No response | 2026-03-14 | 2026-03-16 | |
| 129 | Very long JIT times | GSD-12462 | https://github.com/wavefunction91/ExchCXX and source/reproducers/dpcpp/very_long_jit | Abhi, Colleen | In triage, picked up by the Intel team | 2026-03-13 | 2026-03-18 | |
| 128 | [VTune] EnvironmentSize: 1002: Environment length too long, not supported | VASP-33414, PINT-6768 | /lus/flare/projects/Tools/jkwack-tools-reproducer/Tim/EnvironmentSize_1002 | Tim Williams, JaeHyuk kwack | 🚨 | Fixed internally. fixed in 4.2 pin version (target VTune 2026.2) | 2026-03-06 | 2026-04-15 |
| 127 | next-eval generates bad code from warpx kernel with -O3 | GSD-12419 | /flare/catalyst/world_shared/zippy/warpx | Tim Williams | IGC working on it | 2026-03-05 | 2026-03-18 | |
| 126 | Incorrect answers for OpenMP + SIMD code | CMPLRLLVM-73901 | source/reproducer/icx/simd_omp | Lehman Garrison / Thomas | Reproducer confirmed, Likely will be fixed in oneAPI 2026.1 | 2026-03-03 | 2026-03-23 | |
| 125 | [Frameworks] vLLM async scheduler fail | No response | see issue | Nathan and Khalid | No response | 2026-03-02 | 2026-03-02 | |
| 124 | [LZ] Clarification about zeCommandListHostSynchronize and multiple IMM in-order queues | GSD-12406 | source/reproducers/l0/why_event_not_ready/ | Colleen | 🚨 | No response | 2026-03-02 | 2026-03-02 |
| 122 | [IntelPython] Bug in DPCTL to support for order parameter for dpt.asnumpy | No response | https://github.com/IntelPython/dpctl/issues/2138 and source/reproducers_frameworks/asnumpy/ | Abhi | No response | 2026-02-23 | 2026-03-20 | |
| 121 | [IntelPython] Feature request for sub-class support in dpnp arrays | No response | https://github.com/IntelPython/dpnp/issues/2764 and source/reproducers_frameworks/ndarray_subclass | Abhi | Fixed in github issue | 2026-02-23 | 2026-03-20 | |
| 120 | [IntelPython] dpnp array .data.ptr on array views ignores USM offset | No response | https://github.com/IntelPython/dpnp/issues/2781 and source/reproducers_frameworks/data_ptr | Abhi | 🚨 | Fixed in github issue | 2026-02-23 | 2026-03-20 |
| 119 | [IntelPython] Indexing bug with dpnp nd-array | No response | https://github.com/IntelPython/dpnp/issues/2783 and /reproducers_frameworks/wrong_inplace_4D | Abhi | 🚨 | Most probably there is some fix needed in dpctl, but that is locked due to ongoing migration pf the tensor to dpnp. We will resume work on that, once the migration is completed. | 2026-02-23 | 2026-04-15 |
| 118 | Incorrect RUNPATH in libimf.so and libirng.so | CMPLRLLVM-44505 | Embeded and source/reproducers/general/runpath_libimf_libirng/ | Ye Luo | - Intel Agree, it's nice to have feature. ETA oneAPI 2026.1 | 2026-02-19 | 2026-03-18 | |
| 117 | Fortran ICE module + input)in) | CMPLRLLVM-73523 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/ifx/ice_module_in | Thomas / Victor | 2026.0 | 2026-02-18 | 2026-03-18 | |
| 116 | PCIe counters has a regression on 1.3.X for Datacenter Max GPUs | https://github.com/intel/xpumanager/issues/119 | https://github.com/intel/xpumanager/issues/119 | Servesh | - Need further discussion within Intel, will get update soon | 2026-02-18 | 2026-03-18 | |
| 115 | [Frameworks] flash attn and fused_moe/test_grouped_gemm tests fail in the VLLM framework | GSD-12291, GSD-11768, GSD-12248, CUTLASS9-510 | source/reproducers/frameworks/vllm/ | Servesh / Nathan | 🚨 | No response | 2026-02-16 | 2026-04-03 |
| 114 | Offline debugging issues | No response | N/A | Servesh | 3/5 next release (~1146.56-8), 2/5 still under analysis | 2026-02-16 | 2026-03-18 | |
| 113 | Engineering version of vtune-backend is extremely slow | VASP-33498 | /tmp/rcaddy/tmp on Aurora head node 11 | Robert Caddy | No response | 2026-02-11 | 2026-03-18 | |
| 111 | [Frameworks] alltoallv with zero-sized buffer from pytorch | https://github.com/uxlfoundation/oneCCL/issues/190 MLSL-4075 | https://github.com/argonne-lcf/nekRS-ML/blob/alcf4/3rd_party/dist-gnn/run_all2all_bench.sh | Riccardo Balin | 2022.0 (Not the year!) | 2026-02-05 | 2026-03-18 | |
| 110 | [Frameworks] degraded Ptycho_Vit performance Vs A100 | No response | https://github.com/SYNAPS-I/ptycho-vit/tree/aurora_port | Varuni Katti Sastry | No response | 2026-02-03 | 2026-02-04 | |
| 109 | Global MPI rank issue with STAT | HPE ticket CPE-13691 | /home/jkwack/Tools/STAT/Multi-node_test on Sunspot | JaeHyuk Kwack | 🚨 | No response | 2026-02-02 | 2026-02-02 |
| 106 | [LZ] Hang on zeEventPoolDestroy when called before a non-related non-same-pool signal | GSD-12152 | source/reproducers/l0/multi_event_pools_hang | Colleen, Paulius | Have been reproduced internaly | 2026-01-07 | 2026-03-18 | |
| 104 | [LZ] Crashing with UseKMDMigration | GSD-12102 | source/reproducers/dpcpp/supercontext | Thomas | Under investigation | 2025-12-17 | 2026-01-07 | |
| 102 | [Frameworks][Triton] "No device of requested type available" when ONEAPI_DEVICE_SELECTOR="level_zero:gpu" | PYTORCHDCQ-7882 | source/reproducers/frameworks/triton_get_device | Nathan Nichols | WA: ONEAPI_DEVICE_SELECTOR="*:gpu" https://github.com/intel/intel-xpu-backend-for-triton/pull/5745 | 2025-12-17 | 2026-01-06 | |
| 101 | Signalling a clSetUserEventStatus does not wake up barriers a barrier depending on it for in-order queues. | GSD-12087 | source/reproducers/opencl/user_event_in_order | Paulius Velesko | No response | 2025-12-11 | 2026-03-18 | |
| 100 | Level Zero event used between an in-order immediate command list and out-of-order regular comment list resulting in ZE_RESULT_ERROR_INVALID_ARGUMENT | GSD-12085 | source/reproducers/l0/inorder_outorder_event/ | Paulius Velesko | No response | 2025-12-11 | 2026-03-18 | |
| 99 | Advisor tripcounts analysis fails with a PyTorch example. | ADV-10735 | /flare/Performance/jkwack/Tools/Roofline/SC25_tutorial/ai_ml_profiling/reproducer or /lus/flare/projects/Tools/jkwack-tools-reproducer/JaeHyuk/advisor_pytorch/reproducer or source/reproducers/tools/pytorch_advisor | JaeHyuk Kwack | 🚨 | Under investigation, Advisor and python compatibility issues maybe. works on SP7 but overhead is high (~66x) | 2025-12-09 | 2026-04-15 |
| 98 | Hanging OpenCL code when one command queue waits on an event from another command queue | CMPLRLLVM-72048 / GSD-12075 | source/reproducers/opencl/hanging_marker | Colleen | Under investigation | 2025-12-02 | 2026-03-18 | |
| 97 | SHMEM on Aurora: Unit test wait_until_all-on_queue-2 hanging | https://github.com/oneapi-src/ishmem/issues/15 | source/applications/ishmem_sis | Colleen / Abhi | Actively working on it | 2025-11-21 | 2025-12-10 | |
| 96 | Sporadic libze_intel_gpu.so segmentation fault when running QMCPACK | GSD-12033 | See attached reproducer | Ye Luo | 🚨 | Intel working on reproducing | 2025-11-17 | 2026-02-18 |
| 95 | Memory leak in Libfabric | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/cxi_memory_lead | Rob Lathan | Fixed by https://github.com/ofiwg/libfabric/pull/11334, thanks Rob! expect in SHS 13.1.0, on aurora end of March | 2025-11-13 | 2026-02-18 | |
| 94 | zeMemFree slowdown in a loop | GSD-11962, NEO-17411 | source/reproducers/l0/zememfree_slowdown/ | Colleen | 🚨 | being investigated | 2025-11-08 | 2026-02-18 |
| 92 | SYCL device info free_memory wrong on 2-stack PVC1550 GPU | CMPLRLLVM-71510, GSD-12043 | source/reproducers/dpcpp/sycl_free_flat | Jakub H | Under investigation | 2025-10-31 | 2026-03-18 | |
| 91 | sycl failed malloc_device on GPU takes 20 seconds | GSD-10587 | source/reproducers/dpcpp/slow_alloc/ | Jakub H | - post-1146.40, fixed internally, but under investigation on what to cherry - Fixed in newer release but not cherry picked. - Why not cherry picked? | 2025-10-31 | 2026-03-18 | |
| 90 | Device Sanitizer + LIBOMPTARGET_DEBUG=1 issues for the GAMESS RI-MP2 mini-app | CMPLRLLVM-71455 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/tools/sanitizer_rimp2_test (w60 ones) | Brian | Fixed internally, 2026.0 (end of april) | 2025-10-31 | 2026-02-18 | |
| 89 | Device Sanitizer breaks with MKL DGEMM call in GAMESS RI-MP2 mini-app | MKLD-19334 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/tools/sanitizer_rimp2_test (w30 ones) | Brian, JaeHyuk | Fixed internally, 2026.0 (end of april) | 2025-10-31 | 2026-02-18 | |
| 87 | QUDA compile fail | cmplrllvm-70981 | source/reproducers/openmp/quda_crash | Xiayong Jin / Brian W | 2026.0 (end of april) | 2025-10-28 | 2026-01-07 | |
| 86 | omp_alloc should support pinned memory, or implement proper fallback behavior | CMPLRLIBS-35442 | /home/kweide/projects/OpenMP_VV/tests/5.1/allocate/test_omp_alloctrait_pinned.c and source/reproducers/openmp/omp_alloctrait_pinned in the test set | Klaus Weide | fixed internally -- correct error message. likely 2026.1 | 2025-10-28 | 2026-02-18 | |
| 85 | zeEventQueryKernelTimestampsExt is broken with IMM command lists | GSD-11124 | source/reproducers/l0/zeEventQueryKernelTimestampsExt_clock | Thomas/John Mellor-Crummey | In progress | 2025-10-27 | 2025-11-12 | |
| 81 | IGC_StackOverflowDetection not working | GSD-11763 | source/reproducers/openmp/stack_overflow_not_working | Brian | In progress | 2025-10-15 | 2025-10-29 | |
| 77 | [SYCL] Function pointers compilation issue | CMPLRLLVM-16317 | Reproducer below and source/reproducers/dpcpp/func_pointers | Abhi, Patrick Steinbrecher | 🚨 | Under discussion, heavy lift | 2025-10-06 | 2026-03-18 |
| 76 | Segfaults in MPICH routines in next-eval | No response | for XGC: /lus/flare/projects/catalyst/world_shared/zippy/xgc | Tim Williams | 🚨 | No response | 2025-10-01 | 2025-10-01 |
| 74 | ZES_ENABLE_SYSMAN should default to 1 in the oneapi module | No response | see Details | Tim Williams | No response | 2025-09-29 | 2025-10-15 | |
| 73 | "error: undefined reference to `old_llvm.umul.with.overflow.i64'" in newer kokkos | CMPLRLLVM-70603, GSD-12239 | source/reproducers/dpcpp/kokkos_mdspan_umul | Daniel Arndt | Compiler-side fixed, waiting on agama fix | 2025-09-17 | 2026-02-18 | |
| 71 | RPC launch error tracking | 2025-09-15 | 2025-09-23 | |||||
| 70 | PALS gpu-bind, composite, envall lead to "launch failed" | DCE Case 5392152905 | applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/envall | Thomas Applencourt | Fixed in USS-1.5 (March '26) | 2025-09-10 | 2025-12-09 | |
| 68 | warpx segfaults/hangs with OpenPMD enabled | No response | /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ | Tim Williams | No response | 2025-08-23 | 2026-01-08 | |
| 67 | warpx Debug build crashes oneAPI compiler | CMPLRLLVM-24314 | /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ | Tim Williams | No response | 2025-08-21 | 2025-10-29 | |
| 65 | Clarification requested about ZE_DEVICE_PROPERTY_FLAG_ONDEMANDPAGING on PVC | GSD-11510 | source/reproducers/l0/ondemand_paging/ | Colleen | implemented, post-1146.41+, ~ Jan. (1146.58) | 2025-08-20 | 2026-02-18 | |
| 62 | -ftarget-register-alloc-mode=pvc:large and "-device 12.60.7" for AOT | GSD-11490 | source/reproducers/general/ftarget-register-alloc-mode_flag | Steve Rangel | Fixed internally, 1146.58 | 2025-08-14 | 2026-02-18 | |
| 60 | ext_oneapi_memcpy2d is significantly slower with implicit scaling than explicit and on PVC vs A100 | GSD-11132, GSD-12277 | source/reproducers/dpcpp/ext_oneapi_memcpy2d_perf | Natalie Beams | No response | 2025-07-29 | 2026-02-03 | |
| 55 | Linking in LZ causes changes in signal handling | cmplrlibs-35385, GSD-11413 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/signal_handler/ | Thomas Applencourt, Colleen Bertoni | Fixed internally, still in vetting | 2025-07-22 | 2025-12-10 | |
| 52 | compiler segfaults linking warpx binary | GSD-11357, GSD-11855 | /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/warpx | Tim Williams | 🚨 | 2025.2 + 1146.10 with IGC_OverrideOCLMaxParamSize=4096 workaround but this gives wrong answers. | 2025-07-07 | 2026-04-15 |
| 47 | Non standard MPI knobs suggested for performance | ANL-291 | N/A | Servesh M | No response | 2025-06-23 | 2025-06-27 | |
| 39 | Feature request for Aurora runtime to include debugging symbols | ANL-286, HPCS-15374, GSD-11427 | feature request | Ye Luo | 1146.40 drop | 2025-05-29 | 2026-03-24 | |
| 38 | One application in GRID consistently hangs | GSD-11441 | /lus/flare/projects/Aurora_deployment/xyjin/W/test_grid_g5r5_paboyle | Xiao-Yong Jin | 🚨 | Internal investigation, testing a patch, ~1146.58 | 2025-05-27 | 2026-02-18 |
| 36 | (Occasional Interruptible) hangs in applications | Possibly related to ANL-215 | /lus/flare/projects/Aurora_deployment/xyjin/W/test_example_detar.skel | Xiao-Yong Jin | 🚨 | No response | 2025-05-15 | 2025-07-09 |
| 33 | Crash when calling too many MPI_Probe | https://github.com/pmodels/mpich/issues/7427 | https://github.com/pmodels/mpich/issues/7427 | David--Cléris Timothée | No response | 2025-05-15 | 2025-05-15 | |
| 30 | Copy 2D/3D are broken (zeCommandListAppendMemoryCopyRegion) | NEO-14954, GSD-11132 | https://github.com/rpereira-dev/ze-zoo also source/reproducers/l0/copyRegionPitch | Romain PEREIRA and Thomas APPLENCOURT | 🚨 | Backport fixes from newer runtime to LTS | 2025-05-10 | 2026-03-18 |
| 29 | Significant slowdown with LAMMPS in first run, subsequent runs much faster | No response | /flare/catalyst/proj_shared/knight/projects/ExtremeCarbon/snap-carbon-scaling/1B/ | Christopher Knight | No response | 2025-05-09 | 2026-01-06 | |
| 13 | XGC hangs at scale | CMPLRTST-27836 | xgc-es-cpp-gpu app, ES_ITER test case | Tim Williams | 🚨 | No response | 2025-04-03 | 2026-01-07 |
Closed Issues¶
| Internal ID | Description | Vendor ID | Reproducer Path | PoC | Priority? | Date Opened | Closed Date |
|---|---|---|---|---|---|---|---|
| 133 | Advisor slowdown with SP7 vs SP4 | No response | source/reproducers/tools/advisor_gflop | JaeHyuk | 2026-04-13 | 2026-04-13 | |
| 123 | Various MPI crashes in pytorch at larger scales | No response | below | Khalid | 2026-02-26 | 2026-03-05 | |
| 112 | [MPI] MPI_probe crashing with H/W event overflow | CAST-39582 | in the issue and source/reproducers/mpi/mpi_probe | Colleen | 2026-02-09 | 2026-02-13 | |
| 108 | [LZ] Hanging on event with multiple immediate command lists | No response | source/reproducers/l0/synch_hang_multi_imm | Paulius Velesko | 2026-01-27 | 2026-03-13 | |
| 107 | Vtune times out even when run with collection paused | VASP-33391 | /lus/flare/projects/CoreCollapseModel/rcaddy/vtune_issue /lus/flare/projects/Tools/jkwack-tools-reproducer/Robert_Caddy/vtune_issue | Robert Caddy | 2026-01-07 | 2026-03-05 | |
| 105 | PCIe counters not working on LTS2 2523.31 and xpu-smi 1.2.X | https://github.com/intel/xpumanager/issues/114 GSD-12079 | in issue | Servesh | 2026-01-06 | 2026-03-13 | |
| 103 | [Frameworks][PyTorch][IPEX] PyTorch Complex Matmul support W/O IPEX | No response | /lus/flare/projects/datasets/softwares/testing/ptychi_tests/complex.py in test set at: source/reproducers/frameworks/pytorch_matmul_ipex | Khalid Hossain | 2025-12-17 | 2026-03-13 | |
| 93 | oneCCL exeption with PyTorch DTensor: SYCL recv is not supported for multi-node case | MLSL-3951 | In the text body source/reproducers/frameworks/pytorch_93 (note only for single node, we must test by hand for 2 nodes) | Väinö Hatanpää | 2025-11-05 | 2026-03-13 | |
| 88 | RPATH issue when mixing and matching SDK and spack packages built by another SDK | No response | No need. reprdducer attached in this ticket | Ye Luo | 2025-10-30 | 2026-02-18 | |
| 84 | Device Sanitizer is not functional with OpenMP C/Fortran codes | /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/sanitizer and source/reproducers/tools/sanitizer | JaeHyuk Kwack | 🚨 | 2025-10-22 | 2026-03-13 | |
| 83 | With ifx, openmp_version is missing from omp_lib | CMPLRLIBS-35365 | /home/kweide/tests/test_openmp_version.f90 and source/reproducers/openmp/omp_version in the test set | Klaus Weide | 2025-10-20 | 2026-03-13 | |
| 82 | Symbol missing issue with 1.3 version onwards in SLES and Intel Datacenter Max GPU on Aurora | https://github.com/intel/xpumanager/issues/113 | https://github.com/intel/xpumanager/issues/113 | Servesh | 2025-10-16 | 2026-02-18 | |
| 80 | VTune fails with "Assertion failed: tool_gtpin_support:126: (buffer) " | VASP-32612, GTPIN-1169 | /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/Apps/GAMESS_RI-MP2_MiniApp source/reproducers/tools/vtune_gtpin_fail in the test set | JaeHyuk Kwack | 🚨 | 2025-10-10 | 2026-03-13 |
| 79 | Advisor fail with "advisor: Warning: The application returned a non-zero exit value." | ADV-10687 | source/reproducers/tools/advisor_gflop | JaeHyuk Kwack | 2025-10-08 | 2026-03-13 | |
| 78 | Applications failing to compile with is too large for Clang to process or generating significantly larger exes with "-g" | CMPLRLLVM-70962, (general and related: CMPLRLLVM-53145, CMPLRLLVM-69909, CMPLRLLVM-24314) | source/reproducers/dpcpp/jit_too_large_for_Clang | Abhi | 🚨 | 2025-10-06 | 2026-01-06 |
| 75 | "MPL_gpu_query_is_same_dev(int, int): Assertion `global_dev1 >= 0 && global_dev1 < known_ze_device_count' failed." with mpich.dbg | No response | https://github.com/pmodels/mpich/issues/7602 | Tim, JaeHyuk, Colleen | 2025-09-30 | 2025-10-13 | |
| 72 | MPI_aborts in many applications in next-eval at larger scales | No response | N/A | Brian Holland / Tim Williams | 2025-09-16 | 2025-09-30 | |
| 66 | Compiling with "-g" leads to a much larger binary than without | CMPLRLLVM-69909, CMPLRLLVM-24314 (similar JIRAs) | lammps + -g | Brian Holland | 2025-08-20 | 2026-01-06 | |
| 64 | E3SM fortran compile ICE | CMPLRLLVM-69862 | source/reproducers/ifx/e3sm_homme_ICE_error | Abhi | 2025-08-18 | 2026-03-13 | |
| 63 | Kokkos kernels fails to build with kokkos built with openmp enabled | CMPLRLLVM-69908 | source/applications/kokkos-kernels | Sean Koyama / Colleen Bertoni | 2025-08-18 | 2026-03-13 | |
| 61 | Failing unit tests on PVCs with 2025.2 oneAPI SDK -- is it expected? | https://github.com/uxlfoundation/oneMath/issues/703, CMPLRLLVM-69572, ONSAM-1930, GSD-11482 | https://github.com/uxlfoundation/oneMath/issues/703 | Colleen Bertoni | 2025-07-30 | 2025-11-12 | |
| 59 | [ISHMEM] Unit test fails with ishmem 1.4.0 | https://github.com/oneapi-src/ishmem/issues/10 | https://github.com/oneapi-src/ishmem/issues/10 and source/applications/ishmem_sos | Abhi | 2025-07-25 | 2025-07-31 | |
| 58 | kokkos inclusive and exclusive scan giving incorrect answers for 1146.10 | CMPLRLLVM-69285, GSD-11736 | source/reproducers/dpcpp/kokkos_optimization_scan | Daniel Arndt | 🚨 | 2025-07-23 | 2026-03-13 |
| 57 | GPU segfault in gtensor_bench with 2025.2 | MKLD-18276, CMPLRLIBS-35326, CMPLRLLVM-68696 | source/applications/gtensor_bench | Colleen Bertoni | 2025-07-22 | 2026-03-13 | |
| 56 | RSBench-SYCL incorrect answers with 1146.10 | GSD-11247 | source/applications/RSBench/ | John Tramm, Colleen Bertoni | 2025-07-22 | 2026-03-13 | |
| 54 | oneCCL zeMemGetAddressRange error with alltoallv and zero-sized buffers | oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174, MLSL-3764 | See instructions on oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174 and source/reproducers/mpi/oneccl_174 | Riccardo Balin | 🚨 | 2025-07-18 | 2026-03-13 |
| 53 | IFX Compiler reads and stores floating point values from a text file at single-precision | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/ifx/fp_precision | Victor Anisimov | 🚨 | 2025-07-09 | 2025-07-10 |
| 51 | [SYCL] Bug from SYCL peer_access | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/sycl_peer_access | Abhi | 2025-07-02 | 2025-10-13 | |
| 50 | OpenMP Thread binding | No response | See bellow | Romain PEREIRA | 2025-07-02 | 2025-07-02 | |
| 49 | [E3SM] MPICH bug related to collectives tunning | https://github.com/pmodels/mpich/issues/7456 | https://github.com/pmodels/mpich/issues/7456 | Abhi | 🚨 | 2025-06-27 | 2025-10-09 |
| 48 | Zombie Processes | GSD-11266 | none yet | Servesh M | 🚨 | 2025-06-25 | 2025-10-29 |
| 45 | DDT issues since Aurora upgrade | No response | /lus/flare/projects/catalyst/world_shared/zippy/ddt | Tim Williams | 2025-06-12 | 2025-11-03 | |
| 44 | QMCPACK segfault in libomp | No response | Not yet created | Ye Luo | 🚨 | 2025-06-12 | 2025-07-23 |
| 43 | CMake can't find MKL::MKL_SYCL with MPI wrapper compilers | No response | https://github.com/thilinarmtb/onemkl_cmake_mpi_bug | Thilina Ratnayaka, Colleen Bertoni | 2025-06-11 | 2026-03-13 | |
| 42 | Linking fails with old build environment | No response | /lus/flare/projects/PHASTA_aesp_CNDA/jrwrigh/petsc_build_test | Kris Rowe | 2025-06-06 | 2025-06-10 | |
| 41 | torch.compile segfaults for >2 tiles | MLSL-3728 | /flare/Aurora_deployment/vsastry/torch_compile | Varuni Sastry | 2025-06-06 | 2025-07-24 | |
| 40 | Need SYSMAN support for all modes in recent releases | HPCS-15366, related: GSD-11104 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/leak_zesMemoryGetState | Thomas Applencourt | 🚨 | 2025-05-30 | 2025-06-17 |
| 37 | xpu-smi reports "N/A" for GPU Utilization | RITM0428460, ANL-279, GSD-11252 | any run of xpu-smi | Kyle Felker / Colleen Bertoni | 2025-05-22 | 2025-10-29 | |
| 35 | Avoid outputs exceeding few KBs to stdout/stderr from MPI ranks | RITM0425437 First issue | Large MPI writes to stdout | Servesh Muralidharan | 2025-05-15 | 2025-07-23 | |
| 34 | Runtime Error: pytorch DDP with CCL_BCAST=<"double_tree, direct, naive, maybe others?"> | MLSL-3729 | In issue | Nathan Nichols | 2025-05-15 | 2025-10-13 | |
| 32 | PETSc segfaults in sparse matrix calls | IGDB-6516, GSD-10450 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mkl/csr_gemv_usm/ | Junchao Zhang | 🚨 | 2025-05-15 | 2026-03-13 |
| 31 | GAMESS segfaults with -O0 | GSD-10393, CMPLRLIBS-35345,GSD-11035 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/openmp/gamess_O0_page_fault | Colleen Bertoni | 🚨 | 2025-05-14 | 2026-03-13 |
| 28 | CMake failures with SYCL | No response | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/ | Abhishek Bagusetty | 2025-05-09 | 2025-05-09 | |
| 27 | Build failures on PVC with Cutlass | GSD-11099, https://github.com/codeplaysoftware/cutlass-sycl/issues/329 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/cutlass-sycl | Abhi | 🚨 | 2025-05-07 | 2025-10-13 |
| 26 | L0 memcpy bug | GSD-11142, NEO-14641 | I was doing the same run as QMCPACK SOW runs in the reframe | Ye Luo | 🚨 | 2025-05-06 | 2025-10-13 |
| 25 | Compile fail in Lattice App | Brian reproduced and confirms fixed in 2025.1 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/bug_cgpt_icpx | Xiao-Yong Jin | 🚨 | 2025-05-01 | 2025-10-13 |
| 24 | Noticeably more "ping failed" than before the 2025.1 SDK + 1099.12 UMD/KMD upgrade | JIRA is: HPCS-15331 | N/A | Xiao-Yong Jin Colleen Bertoni | 2025-05-01 | 2025-05-16 | |
| 23 | Apps stop running after Apr 29 upgrade due to libstdc++ dependency | No response | See details | Ye Luo | 2025-04-30 | 2025-05-06 | |
| 22 | SYCL In-order queue broken | NEO-14641 | /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/in-order | Thomas Applencourt | 🚨 | 2025-04-23 | 2025-10-13 |
| 21 | Error during write with Quantum ESPRESSO | No response | see .zip file attached below, also /lus/flare/projects/matml_aesp_CNDA/dir_io_QE_crash | Filippo Simini | 🚨 | 2025-04-17 | 2025-04-18 |
| 20 | Issue with gpu-bind for mpiexec under ZE_FLAT_DEVICE_HIERARCHY=FLAT mode | ANL-283/HPE Support Case 5390607860 | See below | Abhishek, Nathan, Khalid | 2025-04-16 | 2025-10-01 | |
| 19 | Severe CPU memory growth in MPICH | No response | /flare/catalyst/world_shared/zippy/reproducers/issue19 | Tim Williams | 2025-04-04 | 2025-07-31 | |
| 18 | Ping failures and hangs with production runs using GPT/GRID | ANL-251, RITM0404147, RITM0404148, RITM0405730, GSD-11441 | /lus/flare/projects/LatticeFlavor/lehner | Xiao-Yong Jin | 🚨 | 2025-04-04 | 2025-12-11 |
| 17 | hang with MPI pipelining | https://github.com/pmodels/mpich/issues/7373 | Build and run commands are in the MPICH issue. | James Osborn | 2025-04-03 | 2026-03-13 | |
| 16 | Catastrophic memory error in context lmp_aurora_kokkos | No response | public LAMMPS | Chris Knight | 2025-04-03 | 2025-07-23 | |
| 12 | CXI alloc failed on cxi1: request exceeds ACs limits | No response | None | Not Thomas | 2025-04-01 | 2025-12-09 | |
| 9 | Multithreaded data-transfer can cause page-fault | N/A | Full QMCPACK | Ye Luo | 2025-04-01 | 2025-05-08 | |
| 8 | Lots of H2D copies produce CPU I9 error and incorrect value | N/A | Full QMCPACK | Ye Luo | 🚨 | 2025-04-01 | 2025-05-28 |
| 7 | MPI_Bcast gets faster when turning off XPMEM | pmodels/mpich#7334 | see Issue on MPICH GitHub repo | Ye Luo | 2025-04-01 | 2025-04-24 | |
| 6 | MPICH memory allocation slows down at scale | pmodels/mpich#7333 | see MPICH issue | Ye Luo | 🚨 | 2025-04-01 | 2025-04-24 |
| 4 | Incorrect results in receive buffer in GPU memory | MPICH 7312 | grid application (lattice QCD) | Patrick Steinbrecher, Tim Williams | 🚨 | 2025-03-25 | 2025-04-24 |
| 3 | Linker error found by XGC | CMPLRLLVM-66496 | /home/zippy/smalltests/aurora/xgc42/fails | Tim Williams | 2025-03-19 | 2025-03-28 |
Update tables¶
Automatically updated nightly. To update now, wait 10-15s after last change to AuroraBugTracking Issues, then run (anywhere on a machine that has authenticated with gh):
Or execute aurora-bug-table-sync.sh to automatically run everything step-by-step and know exactly when the changes are live online.