Skip to content

< Back to Aurora Known Issues page

Sync tables now

Open Issues

Internal ID Description Vendor ID Reproducer Path PoC Priority? ETA Date Opened Last Updated
146 PMIx_Fence_nb result in shepherd died from signal 11 No response None Hui Zhou No response 2026-06-04 2026-06-04
145 MSAN Host fails with a simple SYCL code CMPLRLLVM-75894, CMPLRLLVM-75881 /lus/flare/projects/Tools/jkwack-tools-reproducer/JaeHyuk/SAN_host_syclQ source/reproducers/tools/msan_sycl_host JaeHyuk Kwack, Renzo Bustamante 🚨 No response 2026-05-28 2026-06-08
144 [OpenMP] Macro + Printf + do_while : "PluginInterface" error CMPLRLLVM-75813 source/reproducers/openmp/printf_macro Thomas No response 2026-05-27 2026-05-27
143 [Frameworks] XPU/XCCL all_gather + torch.xpu.empty_cache leaks device-global memory https://github.com/intel/torch-xpu-ops/issues/3744 See issue source/reproducers_frameworks/allgather_hidden_temp_memleak Nathan Nichols 🚨 Intel can update internally 2026-05-22 2026-05-27
142 Cutlass fails to build GSD-12541 source/reproducers/dpcpp/cutlass-sycl Colleen Fixed, 1146.74 2026-05-21 2026-05-27
141 ASAN HOST got SEGV on multiple nodes CMPLRLLVM-75815 /lus/flare/projects/Tools/jkwack-tools-reproducer/JaeHyuk/ASAN_host_multinodes source/reproducers/tools/asan_multinode/ JaeHyuk Kwack, Saumil Patel, Wendy Wu 🚨 Intel can reproduce internally 2026-05-15 2026-06-05
140 qsig is useless on Aurora HPE Support Case 5403115599 /flare/catalyst/world_shared/zippy/AuroraBugTracking/reproducers/issue140 Tim Williams No response 2026-05-14 2026-05-28
139 libkokkoskernels.so: PC-relative offset overflow in PLT entry CMPLRLLVM-75321 source/application/kokkos-kernels + -g Junchao Zhang Looking at it 2026-05-04 2026-05-27
138 DAOS write from sycl::malloc_host and sycl::malloc_shared No response /lus/flare/projects/gpu_hack/oracle-erf/mbuehlmann/daos_repro Michael Buehlmann trying to get fix into the 2.6 branch. If not, we have to wait until 2.8 at end of year 2026-04-30 2026-05-12
137 Non-bit-for-bit output with some ifx flags CMPLRLLVM-75238 /flare/E3SMinput/world-shared/azamat/intel-sqrt-flags source/reproducers/ifx/bit_for_bit Abhishek Baghusetty Fixed in oneAPI 2026.0 2026-04-29 2026-05-12
136 Unexplained slowdown in isend/irecv simple reproducer (from INCITE code) No response Simplified procedurally generated communication pattern - reproduces at 2 nodes - runs at any node scale /lus/flare/projects/Aurora_deployment/reproducers/bug136/reproducer_2node_simplified.tar.gz [Original Reproducer - From User - 128 node & 1 node - fixed input files for communication pattern] /lus/flare/projects/Aurora_deployment/reproducers/bug136/simple_reproducer.tar.gz Brian Holland 🚨 workaround in mpich for next Aurora update (maybe in June) 2026-04-28 2026-05-26
135 Miscompile: OpSubgroupShuffleINTEL with OpUConvert ushort<->uint and OpBranchConditional consumer GSD-12741 Attached in the body of the issue source/reproducers/igc/miscompile/ Paulius Velesko Root caused 2026-04-23 2026-05-27
134 [Advisor] Incorrect lower precision data ADV-7898 https://github.com/jkwack/JKBench_Tools/tree/Low_Precision/Low_Precision JaeHyuk Kwack, Riccardo Balin 🚨 Fix in place, ETA: estimate is oneAPI 2026.2 2026-04-13 2026-05-27
132 [DPC++] Print not working at O0 CMPLRLLVM-74689 source/reproducers/dpcpp/print_bug/ Thomas Under investigation 2026-04-07 2026-05-27
131 [Runtime] magma test failing GSD-12314 source/reproducers/dpcpp/magama_crash Natalie Beam, Colleen Fixed in newer, ETA: ww22, end of May, 1146.74 2026-03-18 2026-05-27
130 [LZ] Timestamp and in-order queues leads to wrong answers GSD-12468 source/reproducers/l0/timestamp_wrong_answer Colleen Fixed implemented, under verification. looking to backport, hopefully 1146.74 (or after) 2026-03-14 2026-05-27
129 Very long JIT times GSD-12462 https://github.com/wavefunction91/ExchCXX and source/reproducers/dpcpp/very_long_jit Abhi, Colleen In triage, picked up by the Intel team, time down to 30 min 2026-03-13 2026-06-03
128 [VTune] EnvironmentSize: 1002: Environment length too long, not supported VASP-33414, PINT-6768 /lus/flare/projects/Tools/jkwack-tools-reproducer/Tim/EnvironmentSize_1002 Tim Williams, JaeHyuk kwack 🚨 Fixed internally. fixed in 4.2 pin version (target VTune 2026.2) 2026-03-06 2026-04-15
127 next-eval generates bad code from warpx kernel with -O3 GSD-12419 /flare/catalyst/world_shared/zippy/warpx Tim Williams Fix implemented in IGC, needs to be verified and backported. targeted for COE build (end of May-ish) 2026-03-05 2026-05-27
126 Incorrect answers for OpenMP + SIMD code CMPLRLLVM-73901 source/reproducer/icx/simd_omp Lehman Garrison / Thomas Reproducer confirmed, Likely will be fixed in oneAPI 2026.2 2026-03-03 2026-05-27
125 [Frameworks] vLLM async scheduler fail No response see issue Nathan and Khalid reproduced internally, vLLM issue 2026-03-02 2026-05-27
124 [LZ] Clarification about zeCommandListHostSynchronize and multiple IMM in-order queues GSD-12406 source/reproducers/l0/why_event_not_ready/ Colleen 🚨 reproduced internally 2026-03-02 2026-05-27
122 [IntelPython] Bug in DPCTL to support for order parameter for dpt.asnumpy No response https://github.com/IntelPython/dpnp/issues/2884 and source/reproducers_frameworks/asnumpy/ Abhi No response 2026-02-23 2026-03-20
121 [IntelPython] Feature request for sub-class support in dpnp arrays No response https://github.com/IntelPython/dpnp/issues/2764 and source/reproducers_frameworks/ndarray_subclass Abhi Fixed in github issue 2026-02-23 2026-03-20
120 [IntelPython] dpnp array .data.ptr on array views ignores USM offset No response https://github.com/IntelPython/dpnp/issues/2781 and source/reproducers_frameworks/data_ptr Abhi 🚨 Fixed in github issue 2026-02-23 2026-03-20
119 [IntelPython] Indexing bug with dpnp nd-array No response https://github.com/IntelPython/dpnp/issues/2783 and /reproducers_frameworks/wrong_inplace_4D Abhi 🚨 Fixed in github issue. 2026-02-23 2026-05-26
118 Incorrect RUNPATH in libimf.so and libirng.so CMPLRLLVM-44505 Embeded and source/reproducers/general/runpath_libimf_libirng/ Ye Luo - Intel Agree, it's nice to have feature. ETA oneAPI 2026.2 2026-02-19 2026-05-27
117 Fortran ICE module + input)in) CMPLRLLVM-73523 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/ifx/ice_module_in Thomas / Victor 2026.0 2026-02-18 2026-03-18
115 [Frameworks] flash attn and fused_moe/test_grouped_gemm tests fail in the VLLM framework GSD-12291, GSD-11768, GSD-12248, CUTLASS9-510 source/reproducers/frameworks/vllm/ Servesh / Nathan 🚨 WW21 (May 22nd for LTS release which should have fixed) GSD-12291: fixed internally (~2026.1) GSD-11768, GSD-12248, CUTLASS9-510 end of May-ish. preview of COE build soon. 2026-02-16 2026-05-27
114 Offline debugging issues No response N/A Servesh 3/5 next release (~1146.56-8), 2/5 still under analysis Internal build is testing patches. 1146.74 has many fixed, still at least one remaining 2026-02-16 2026-05-27
111 [Frameworks] alltoallv with zero-sized buffer from pytorch https://github.com/uxlfoundation/oneCCL/issues/190 MLSL-4075 https://github.com/argonne-lcf/nekRS-ML/blob/alcf4/3rd_party/dist-gnn/run_all2all_bench.sh Riccardo Balin 2022.0 (Not the year!) 2026.0 full release. will test when we get a frameworks module. 2026-02-05 2026-05-29
110 [Frameworks] degraded Ptycho_Vit performance Vs A100 No response https://github.com/SYNAPS-I/ptycho-vit/tree/aurora_port Varuni Katti Sastry No response 2026-02-03 2026-02-04
109 Global MPI rank issue with STAT HPE ticket CPE-13691 /home/jkwack/Tools/STAT/Multi-node_test on Sunspot JaeHyuk Kwack 🚨 No response 2026-02-02 2026-02-02
106 [LZ] Hang on zeEventPoolDestroy when called before a non-related non-same-pool signal GSD-12152 source/reproducers/l0/multi_event_pools_hang Colleen, Paulius Have been reproduced internally 2026-01-07 2026-05-27
104 [LZ] Crashing with UseKMDMigration GSD-12102 source/reproducers/dpcpp/supercontext Thomas Triaged, GPU driver team 2025-12-17 2026-04-15
102 [Frameworks][Triton] "No device of requested type available" when ONEAPI_DEVICE_SELECTOR="level_zero:gpu" PYTORCHDCQ-7882 source/reproducers/frameworks/triton_get_device Nathan Nichols WA: ONEAPI_DEVICE_SELECTOR="*:gpu" https://github.com/intel/intel-xpu-backend-for-triton/pull/5745 merged (3.6.1) 2025-12-17 2026-04-15
101 Signalling a clSetUserEventStatus does not wake up barriers a barrier depending on it for in-order queues. GSD-12087 source/reproducers/opencl/user_event_in_order Paulius Velesko Reproduced internally, root causing 2025-12-11 2026-05-27
100 Level Zero event used between an in-order immediate command list and out-of-order regular comment list resulting in ZE_RESULT_ERROR_INVALID_ARGUMENT GSD-12085 source/reproducers/l0/inorder_outorder_event/ Paulius Velesko Triaged, possible WA, 2025-12-11 2026-04-15
99 Advisor tripcounts analysis fails with a PyTorch example. ADV-10735 /flare/Performance/jkwack/Tools/Roofline/SC25_tutorial/ai_ml_profiling/reproducer or /lus/flare/projects/Tools/jkwack-tools-reproducer/JaeHyuk/advisor_pytorch/reproducer or source/reproducers/tools/pytorch_advisor JaeHyuk Kwack 🚨 Under investigation, Advisor and python compatibility issues maybe. works on SP7 but overhead is high (~66x) 2025-12-09 2026-04-16
98 Hanging OpenCL code when one command queue waits on an event from another command queue CMPLRLLVM-72048 / GSD-12075 source/reproducers/opencl/hanging_marker Colleen Under investigation 2025-12-02 2026-03-18
97 SHMEM on Aurora: Unit test wait_until_all-on_queue-2 hanging https://github.com/oneapi-src/ishmem/issues/15 source/applications/ishmem_sis Colleen / Abhi Actively working on it 2025-11-21 2025-12-10
96 Sporadic libze_intel_gpu.so segmentation fault when running QMCPACK GSD-12033 See attached reproducer and source/reproducers/l0/cpu_lock_segfault Ye Luo 🚨 Intel has internal reproducer, still investigating 2025-11-17 2026-05-27
95 Memory leak in Libfabric No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/cxi_memory_lead Rob Lathan Fixed by https://github.com/ofiwg/libfabric/pull/11334, thanks Rob! expect in SHS 13.1.0, on aurora end of March 2025-11-13 2026-04-16
94 zeMemFree slowdown in a loop GSD-11962, NEO-17411 source/reproducers/l0/zememfree_slowdown/ Colleen being investigated, not sure if KMD or user-space 2025-11-08 2026-04-15
92 SYCL device info free_memory wrong on 2-stack PVC1550 GPU CMPLRLLVM-71510, GSD-12043 source/reproducers/dpcpp/sycl_free_flat Jakub H Under investigation, Assigned to SYSMAN people new API in core instead of SYSMAN. backporting into LTS ~2026.1 or 2026.1 1146.75+ 2025-10-31 2026-05-27
91 sycl failed malloc_device on GPU takes 20 seconds GSD-10587 source/reproducers/dpcpp/slow_alloc/ Jakub H - post-1146.40, fixed internally, but under investigation on what to cherry - Fixed in newer release but not cherry picked. - Being Re-worked / re-verified 2025-10-31 2026-05-27
90 Device Sanitizer + LIBOMPTARGET_DEBUG=1 issues for the GAMESS RI-MP2 mini-app CMPLRLLVM-71455 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/tools/sanitizer_rimp2_test (w60 ones) Brian Fixed internally, 2026.0 (end of april) 2025-10-31 2026-02-18
89 Device Sanitizer breaks with MKL DGEMM call in GAMESS RI-MP2 mini-app MKLD-19334 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/tools/sanitizer_rimp2_test (w30 ones) Brian, JaeHyuk Fixed internally, 2026.0 (end of april) 2025-10-31 2026-02-18
87 QUDA compile fail cmplrllvm-70981 source/reproducers/openmp/quda_crash Xiayong Jin / Brian W 2026.0 (end of april) 2025-10-28 2026-01-07
86 omp_alloc should support pinned memory, or implement proper fallback behavior CMPLRLIBS-35442 /home/kweide/projects/OpenMP_VV/tests/5.1/allocate/test_omp_alloctrait_pinned.c and source/reproducers/openmp/omp_alloctrait_pinned in the test set Klaus Weide fixed internally -- correct error message. likely 2026.2 2025-10-28 2026-05-27
85 zeEventQueryKernelTimestampsExt is broken with IMM command lists GSD-11124 source/reproducers/l0/zeEventQueryKernelTimestampsExt_clock Thomas/John Mellor-Crummey In progress, updating specification 2025-10-27 2026-05-27
81 IGC_StackOverflowDetection not working GSD-11763 source/reproducers/openmp/stack_overflow_not_working Brian 🚨 Implemented internally. 2025-10-15 2026-05-27
77 [SYCL] Function pointers compilation issue CMPLRLLVM-16317 Reproducer below and source/reproducers/dpcpp/func_pointers Abhi, Patrick Steinbrecher 🚨 Under discussion, heavy lift, possibly next year 2025-10-06 2026-05-27
76 Segfaults in MPICH routines in next-eval No response for XGC: /lus/flare/projects/catalyst/world_shared/zippy/xgc Tim Williams 🚨 No response 2025-10-01 2025-10-01
74 ZES_ENABLE_SYSMAN should default to 1 in the oneapi module No response see Details Tim Williams No response 2025-09-29 2025-10-15
73 "error: undefined reference to `old_llvm.umul.with.overflow.i64'" in newer kokkos CMPLRLLVM-70603, GSD-12239 source/reproducers/dpcpp/kokkos_mdspan_umul Daniel Arndt Compiler-side fixed, COE agama release (1st or 2nd release) 2025-09-17 2026-05-27
71 RPC launch error tracking 2025-09-15 2025-09-23
70 PALS gpu-bind, composite, envall lead to "launch failed" DCE Case 5392152905 applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mpi/envall Thomas Applencourt Fixed in USS-1.5 (March '26) 2025-09-10 2025-12-09
68 warpx segfaults/hangs with OpenPMD enabled No response /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ Tim Williams No response 2025-08-23 2026-05-29
65 Clarification requested about ZE_DEVICE_PROPERTY_FLAG_ONDEMANDPAGING on PVC GSD-11510 source/reproducers/l0/ondemand_paging/ Colleen implemented, post-1146.41+, ~ Jan. (1146.58) 2025-08-20 2026-02-18
62 -ftarget-register-alloc-mode=pvc:large and "-device 12.60.7" for AOT GSD-11490 source/reproducers/general/ftarget-register-alloc-mode_flag Steve Rangel Fixed internally, 1146.58 2025-08-14 2026-02-18
60 ext_oneapi_memcpy2d is significantly slower with implicit scaling than explicit and on PVC vs A100 GSD-11132, GSD-12277 source/reproducers/dpcpp/ext_oneapi_memcpy2d_perf Natalie Beams No response 2025-07-29 2026-02-03
55 Linking in LZ causes changes in signal handling cmplrlibs-35385, GSD-11413 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/signal_handler/ Thomas Applencourt, Colleen Bertoni Fixed internally, still in vetting 2025-07-22 2025-12-10
47 Non standard MPI knobs suggested for performance ANL-291 N/A Servesh M No response 2025-06-23 2025-06-27
38 One application in GRID consistently hangs GSD-11441 /lus/flare/projects/Aurora_deployment/xyjin/W/test_grid_g5r5_paboyle Xiao-Yong Jin 🚨 Fixed internally with 1146.59 (Patrick S.) 2025-05-27 2026-04-15
36 (Occasional Interruptible) hangs in applications Possibly related to ANL-215 /lus/flare/projects/Aurora_deployment/xyjin/W/test_example_detar.skel Xiao-Yong Jin 🚨 No response 2025-05-15 2025-07-09
33 Crash when calling too many MPI_Probe https://github.com/pmodels/mpich/issues/7427 https://github.com/pmodels/mpich/issues/7427 David--ClΓ©ris TimothΓ©e No response 2025-05-15 2025-05-15
30 Copy 2D/3D are broken (zeCommandListAppendMemoryCopyRegion) NEO-14954, GSD-11132 https://github.com/rpereira-dev/ze-zoo also source/reproducers/l0/copyRegionPitch Romain PEREIRA and Thomas APPLENCOURT 🚨 Backport fixes from newer runtime to LTS (ETA: ww22, end of May) ~1146.74 2025-05-10 2026-05-27
13 XGC hangs at scale CMPLRTST-27836 xgc-es-cpp-gpu app, ES_ITER test case Tim Williams 🚨 No response 2025-04-03 2026-05-28

Closed Issues

Internal ID Description Vendor ID Reproducer Path PoC Priority? Date Opened Closed Date
133 Advisor slowdown with SP7 vs SP4 No response source/reproducers/tools/advisor_gflop JaeHyuk 2026-04-13 2026-04-13
123 Various MPI crashes in pytorch at larger scales No response below Khalid 2026-02-26 2026-03-05
116 PCIe counters has a regression on 1.3.X for Datacenter Max GPUs https://github.com/intel/xpumanager/issues/119 https://github.com/intel/xpumanager/issues/119 Servesh 2026-02-18 2026-04-15
113 Engineering version of vtune-backend is extremely slow VASP-33498 /tmp/rcaddy/tmp on Aurora head node 11 Robert Caddy 2026-02-11 2026-04-20
112 [MPI] MPI_probe crashing with H/W event overflow CAST-39582 in the issue and source/reproducers/mpi/mpi_probe Colleen 2026-02-09 2026-02-13
108 [LZ] Hanging on event with multiple immediate command lists No response source/reproducers/l0/synch_hang_multi_imm Paulius Velesko 2026-01-27 2026-03-13
107 Vtune times out even when run with collection paused VASP-33391 /lus/flare/projects/CoreCollapseModel/rcaddy/vtune_issue /lus/flare/projects/Tools/jkwack-tools-reproducer/Robert_Caddy/vtune_issue Robert Caddy 2026-01-07 2026-03-05
105 PCIe counters not working on LTS2 2523.31 and xpu-smi 1.2.X https://github.com/intel/xpumanager/issues/114 GSD-12079 in issue Servesh 2026-01-06 2026-03-13
103 [Frameworks][PyTorch][IPEX] PyTorch Complex Matmul support W/O IPEX No response /lus/flare/projects/datasets/softwares/testing/ptychi_tests/complex.py in test set at: source/reproducers/frameworks/pytorch_matmul_ipex Khalid Hossain 2025-12-17 2026-03-13
93 oneCCL exeption with PyTorch DTensor: SYCL recv is not supported for multi-node case MLSL-3951 In the text body source/reproducers/frameworks/pytorch_93 (note only for single node, we must test by hand for 2 nodes) VÀinâ HatanpÀÀ 2025-11-05 2026-03-13
88 RPATH issue when mixing and matching SDK and spack packages built by another SDK No response No need. reprdducer attached in this ticket Ye Luo 2025-10-30 2026-02-18
84 Device Sanitizer is not functional with OpenMP C/Fortran codes /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/sanitizer and source/reproducers/tools/sanitizer JaeHyuk Kwack 🚨 2025-10-22 2026-03-13
83 With ifx, openmp_version is missing from omp_lib CMPLRLIBS-35365 /home/kweide/tests/test_openmp_version.f90 and source/reproducers/openmp/omp_version in the test set Klaus Weide 2025-10-20 2026-03-13
82 Symbol missing issue with 1.3 version onwards in SLES and Intel Datacenter Max GPU on Aurora https://github.com/intel/xpumanager/issues/113 https://github.com/intel/xpumanager/issues/113 Servesh 2025-10-16 2026-02-18
80 VTune fails with "Assertion failed: tool_gtpin_support:126: (buffer) " VASP-32612, GTPIN-1169 /lus/flare/projects/Aurora_deployment/jkwack/JK_AT_Tools/Apps/GAMESS_RI-MP2_MiniApp source/reproducers/tools/vtune_gtpin_fail in the test set JaeHyuk Kwack 🚨 2025-10-10 2026-03-13
79 Advisor fail with "advisor: Warning: The application returned a non-zero exit value." ADV-10687 source/reproducers/tools/advisor_gflop JaeHyuk Kwack 2025-10-08 2026-03-13
78 Applications failing to compile with is too large for Clang to process or generating significantly larger exes with "-g" CMPLRLLVM-70962, (general and related: CMPLRLLVM-53145, CMPLRLLVM-69909, CMPLRLLVM-24314) source/reproducers/dpcpp/jit_too_large_for_Clang Abhi 🚨 2025-10-06 2026-01-06
75 "MPL_gpu_query_is_same_dev(int, int): Assertion `global_dev1 >= 0 && global_dev1 < known_ze_device_count' failed." with mpich.dbg No response https://github.com/pmodels/mpich/issues/7602 Tim, JaeHyuk, Colleen 2025-09-30 2025-10-13
72 MPI_aborts in many applications in next-eval at larger scales No response N/A Brian Holland / Tim Williams 2025-09-16 2025-09-30
67 warpx Debug build crashes oneAPI compiler CMPLRLLVM-24314 /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/ Tim Williams 2025-08-21 2026-05-28
66 Compiling with "-g" leads to a much larger binary than without CMPLRLLVM-69909, CMPLRLLVM-24314 (similar JIRAs) lammps + -g Brian Holland 2025-08-20 2026-01-06
64 E3SM fortran compile ICE CMPLRLLVM-69862 source/reproducers/ifx/e3sm_homme_ICE_error Abhi 2025-08-18 2026-03-13
63 Kokkos kernels fails to build with kokkos built with openmp enabled CMPLRLLVM-69908 source/applications/kokkos-kernels Sean Koyama / Colleen Bertoni 2025-08-18 2026-03-13
61 Failing unit tests on PVCs with 2025.2 oneAPI SDK -- is it expected? https://github.com/uxlfoundation/oneMath/issues/703, CMPLRLLVM-69572, ONSAM-1930, GSD-11482 https://github.com/uxlfoundation/oneMath/issues/703 Colleen Bertoni 2025-07-30 2025-11-12
59 [ISHMEM] Unit test fails with ishmem 1.4.0 https://github.com/oneapi-src/ishmem/issues/10 https://github.com/oneapi-src/ishmem/issues/10 and source/applications/ishmem_sos Abhi 2025-07-25 2025-07-31
58 kokkos inclusive and exclusive scan giving incorrect answers for 1146.10 CMPLRLLVM-69285, GSD-11736 source/reproducers/dpcpp/kokkos_optimization_scan Daniel Arndt 🚨 2025-07-23 2026-03-13
57 GPU segfault in gtensor_bench with 2025.2 MKLD-18276, CMPLRLIBS-35326, CMPLRLLVM-68696 source/applications/gtensor_bench Colleen Bertoni 2025-07-22 2026-03-13
56 RSBench-SYCL incorrect answers with 1146.10 GSD-11247 source/applications/RSBench/ John Tramm, Colleen Bertoni 2025-07-22 2026-03-13
54 oneCCL zeMemGetAddressRange error with alltoallv and zero-sized buffers oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174, MLSL-3764 See instructions on oneCCL GitHub Issue: https://github.com/uxlfoundation/oneCCL/issues/174 and source/reproducers/mpi/oneccl_174 Riccardo Balin 🚨 2025-07-18 2026-03-13
53 IFX Compiler reads and stores floating point values from a text file at single-precision No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/ifx/fp_precision Victor Anisimov 🚨 2025-07-09 2025-07-10
52 compiler segfaults linking warpx binary GSD-11357, GSD-11855 /lus/flare/projects/catalyst/world_shared/zippy/reproducers/issue52/warpx Tim Williams 🚨 2025-07-07 2026-06-01
51 [SYCL] Bug from SYCL peer_access No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/sycl_peer_access Abhi 2025-07-02 2025-10-13
50 OpenMP Thread binding No response See bellow Romain PEREIRA 2025-07-02 2025-07-02
49 [E3SM] MPICH bug related to collectives tunning https://github.com/pmodels/mpich/issues/7456 https://github.com/pmodels/mpich/issues/7456 Abhi 🚨 2025-06-27 2025-10-09
48 Zombie Processes GSD-11266 none yet Servesh M 🚨 2025-06-25 2025-10-29
45 DDT issues since Aurora upgrade No response /lus/flare/projects/catalyst/world_shared/zippy/ddt Tim Williams 2025-06-12 2025-11-03
44 QMCPACK segfault in libomp No response Not yet created Ye Luo 🚨 2025-06-12 2025-07-23
43 CMake can't find MKL::MKL_SYCL with MPI wrapper compilers No response https://github.com/thilinarmtb/onemkl_cmake_mpi_bug Thilina Ratnayaka, Colleen Bertoni 2025-06-11 2026-03-13
42 Linking fails with old build environment No response /lus/flare/projects/PHASTA_aesp_CNDA/jrwrigh/petsc_build_test Kris Rowe 2025-06-06 2025-06-10
41 torch.compile segfaults for >2 tiles MLSL-3728 /flare/Aurora_deployment/vsastry/torch_compile Varuni Sastry 2025-06-06 2025-07-24
40 Need SYSMAN support for all modes in recent releases HPCS-15366, related: GSD-11104 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/l0/leak_zesMemoryGetState Thomas Applencourt 🚨 2025-05-30 2025-06-17
39 Feature request for Aurora runtime to include debugging symbols ANL-286, HPCS-15374, GSD-11427 feature request Ye Luo 2025-05-29 2026-04-23
37 xpu-smi reports "N/A" for GPU Utilization RITM0428460, ANL-279, GSD-11252 any run of xpu-smi Kyle Felker / Colleen Bertoni 2025-05-22 2025-10-29
35 Avoid outputs exceeding few KBs to stdout/stderr from MPI ranks RITM0425437 First issue Large MPI writes to stdout Servesh Muralidharan 2025-05-15 2025-07-23
34 Runtime Error: pytorch DDP with CCL_BCAST=<"double_tree, direct, naive, maybe others?"> MLSL-3729 In issue Nathan Nichols 2025-05-15 2025-10-13
32 PETSc segfaults in sparse matrix calls IGDB-6516, GSD-10450 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/mkl/csr_gemv_usm/ Junchao Zhang 🚨 2025-05-15 2026-03-13
31 GAMESS segfaults with -O0 GSD-10393, CMPLRLIBS-35345,GSD-11035 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/openmp/gamess_O0_page_fault Colleen Bertoni 🚨 2025-05-14 2026-03-13
29 Significant slowdown with LAMMPS in first run, subsequent runs much faster No response /flare/catalyst/proj_shared/knight/projects/ExtremeCarbon/snap-carbon-scaling/1B/ Christopher Knight 2025-05-09 2026-04-15
28 CMake failures with SYCL No response /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/ Abhishek Bagusetty 2025-05-09 2025-05-09
27 Build failures on PVC with Cutlass GSD-11099, https://github.com/codeplaysoftware/cutlass-sycl/issues/329 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/cutlass-sycl Abhi 🚨 2025-05-07 2025-10-13
26 L0 memcpy bug GSD-11142, NEO-14641 I was doing the same run as QMCPACK SOW runs in the reframe Ye Luo 🚨 2025-05-06 2025-10-13
25 Compile fail in Lattice App Brian reproduced and confirms fixed in 2025.1 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/bug_cgpt_icpx Xiao-Yong Jin 🚨 2025-05-01 2025-10-13
24 Noticeably more "ping failed" than before the 2025.1 SDK + 1099.12 UMD/KMD upgrade JIRA is: Β HPCS-15331 N/A Xiao-Yong Jin Colleen Bertoni 2025-05-01 2025-05-16
23 Apps stop running after Apr 29 upgrade due to libstdc++ dependency No response See details Ye Luo 2025-04-30 2025-05-06
22 SYCL In-order queue broken NEO-14641 /lus/flare/projects/Aurora_deployment/applications.hpc.argonne-national-lab.aurora.anl-testing/source/reproducers/dpcpp/in-order Thomas Applencourt 🚨 2025-04-23 2025-10-13
21 Error during write with Quantum ESPRESSO No response see .zip file attached below, also /lus/flare/projects/matml_aesp_CNDA/dir_io_QE_crash Filippo Simini 🚨 2025-04-17 2025-04-18
20 Issue with gpu-bind for mpiexec under ZE_FLAT_DEVICE_HIERARCHY=FLAT mode ANL-283/HPE Support Case 5390607860 See below Abhishek, Nathan, Khalid 2025-04-16 2025-10-01
19 Severe CPU memory growth in MPICH No response /flare/catalyst/world_shared/zippy/reproducers/issue19 Tim Williams 2025-04-04 2025-07-31
18 Ping failures and hangs with production runs using GPT/GRID ANL-251, RITM0404147, RITM0404148, RITM0405730, GSD-11441 /lus/flare/projects/LatticeFlavor/lehner Xiao-Yong Jin 🚨 2025-04-04 2025-12-11
17 hang with MPI pipelining https://github.com/pmodels/mpich/issues/7373 Build and run commands are in the MPICH issue. James Osborn 2025-04-03 2026-03-13
16 Catastrophic memory error in context lmp_aurora_kokkos No response public LAMMPS Chris Knight 2025-04-03 2025-07-23
12 CXI alloc failed on cxi1: request exceeds ACs limits No response None Not Thomas 2025-04-01 2025-12-09
9 Multithreaded data-transfer can cause page-fault N/A Full QMCPACK Ye Luo 2025-04-01 2025-05-08
8 Lots of H2D copies produce CPU I9 error and incorrect value N/A Full QMCPACK Ye Luo 🚨 2025-04-01 2025-05-28
7 MPI_Bcast gets faster when turning off XPMEM pmodels/mpich#7334 see Issue on MPICH GitHub repo Ye Luo 2025-04-01 2025-04-24
6 MPICH memory allocation slows down at scale pmodels/mpich#7333 see MPICH issue Ye Luo 🚨 2025-04-01 2025-04-24
4 Incorrect results in receive buffer in GPU memory MPICH 7312 grid application (lattice QCD) Patrick Steinbrecher, Tim Williams 🚨 2025-03-25 2025-04-24
3 Linker error found by XGC CMPLRLLVM-66496 /home/zippy/smalltests/aurora/xgc42/fails Tim Williams 2025-03-19 2025-03-28

Update tables

Automatically updated nightly. To update now, wait 10-15s after last change to AuroraBugTracking Issues, then run (anywhere on a machine that has authenticated with gh):

gh workflow run "Update Submodules" --repo "argonne-lcf/user-guides" && GH_FORCE_TTY=100% watch -c -n1 gh run list --repo "argonne-lcf/user-guides"
And wait ~2m until no jobs are running.

Or execute aurora-bug-table-sync.sh to automatically run everything step-by-step and know exactly when the changes are live online.