Work around CCE 19.0.0 compiler bugs for Cray+OpenACC builds#1286
Work around CCE 19.0.0 compiler bugs for Cray+OpenACC builds#1286sbryngelson merged 28 commits intoMFlowCode:masterfrom
Conversation
…: add -Oipa0 m_phase_change.fpp triggers the same CCE 19.0.0 bring_routine_resident SIGSEGV during IPA as m_bubbles_EL. Caller-side !DIR$ NOINLINE directives (commit 628a046) were insufficient. Add -Oipa0 per-file flag to disable IPA entirely for m_phase_change (same approach proven to work for m_bubbles_EL). Consolidate both files in one set_source_files_properties call. See PR MFlowCode#1286. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three distinct CCE 19.0.0 compiler bugs required fixes: Bug 1: InstCombine ICE in matmul() in m_phase_change.fpp - Replace matmul() with explicit 2x2 arithmetic Bug 2: IPA bring_routine_resident SIGSEGV in m_phase_change.fpp - Add -Oipa0 per-file in CMakeLists.txt (Cray+OpenACC only) - Use cray_noinline=True on 4 GPU_ROUTINE calls in m_phase_change.fpp and 4 in m_variables_conversion.fpp Bug 3: IPA castIsValid ICE in m_bubbles_EL.fpp - Change proc_bubble_counts from VLA to allocatable - Add -Oipa0 per-file in CMakeLists.txt (Cray+OpenACC only) Bug 4: m_chemistry.fpp VLA ICE in case-optimized pre_process builds - Guard 4 dimension(num_species) local arrays with USING_CCE Bug 5: Pyrometheus GPU_ROUTINE macro missing !acc routine seq on Cray+ACC - Post-process generated m_thermochem.f90 in toolchain/mfc/run/input.py to replace the broken Cray INLINEALWAYS-only macro with plain #define GPU_ROUTINE(name) !acc routine seq Also fix uninitialized FT in s_TSat (use huge(1.0_wp) not huge(FT)). See PR MFlowCode#1286.
…unrelated to CCE fix)
6f97c9f to
1aa4cf5
Compare
Claude Code ReviewHead SHA: 61924d8
Summary
Findings1.
|
Claude Code ReviewHead SHA: ac28127 Summary
Findings1. 2. 3. Pyrometheus patch in 4. Caller-side 5. Phoenix and AMD CI disabled at PR open Minor / Non-blocking
Overall this is a well-targeted and well-documented set of compiler workarounds. The code changes are minimal and surgical. Main ask: confirm Phoenix and AMD CI are re-enabled before merge. |
Claude Code ReviewHead SHA: 4631c9b Changed files
Summary
Findings[Medium] The diff removes the line declaring two VLA local arrays that were on the same declaration line as - integer, dimension(num_procs) :: part_order, part_ord_mpi
- integer, dimension(num_procs) :: proc_bubble_counts
+ integer, allocatable :: proc_bubble_counts(:)
[Low] The guard is: if (CMAKE_Fortran_COMPILER_ID STREQUAL "Cray" AND NOT MFC_OpenMP)This disables IPA for [Low] Duplicated The limit is defined independently in:
Comments note they must match. This is a maintenance risk: if someone changes one without the other, the Python check and Fortran PROHIBIT diverge silently for CPU builds (only GPU builds raise a hard error in Python). Consider exporting this constant from a single location (e.g., a shared config or the case.fpp/Fypp variable passed down), or at minimum add a test that asserts the two values agree. The current state is workable but fragile. [Info] Pyrometheus patch in The string-replacement approach is fragile but well-defended: it raises [Info] This large refactor removes the progress-bar infrastructure and is unrelated to CCE bug fixes. Bundling it here makes the diff harder to review. It should ideally be a separate PR, but if the functionality is preserved and CI is green it is acceptable. Just note that this changes the OverallThe Fortran-side fixes are correct and well-reasoned. The 🤖 Generated with Claude Code |
Claude Code ReviewHead SHA: Files changed: 16 Summary
Findings1. Duplicate
The PR acknowledges this with a comment, and the Fortran side has 2. 3. 4. Pyrometheus patch in 5. No blocking issues found. The implementation is technically sound, well-documented, and the workarounds are appropriately scoped (per-file
|
eaecef1 to
ac28127
Compare
Claude Code ReviewHead SHA: ac28127 Summary
Findings1. Untracked removal of The diff removes two VLA declarations ( 2. Duplicated The comment says "Must match the Fypp constant" but there is no build-time or CI check to enforce this. A future patch to one location that misses the other will silently diverge. Consider extracting this value to a single source of truth (e.g. a shared config, or at minimum a clearer cross-file comment with a test that compares them). As-is, this is a maintainability risk. 3. Temporarily disabled CI must be re-enabled before merge Phoenix GT (all 3 test jobs + 3 bench jobs) and Frontier AMD (3 test + 1 bench jobs) are disabled with a comment marker. The PR description says these will be re-enabled before merge, but they are currently absent from the matrix in this PR. If they are not restored before merge, the CI coverage gap becomes permanent until a follow-up PR explicitly re-adds them. 4. The comment in the CMakeLists block says "Applied to Cray+OpenACC and Cray CPU" but does not explain why Cray CPU builds need 5. Pyrometheus string-patch fragility (low severity — handled correctly) The macro-substitution approach is inherently fragile (whitespace/line-ending sensitivity), but the code correctly detects all three cases (patch applied, already correct, unexpected format) and raises Minor/Nits
Overall this is a well-documented, well-scoped set of compiler bug workarounds. The most important pre-merge items are #1 (verify |
Restore all runners that were temporarily disabled to speed up initial CI validation of the CCE fix. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files changed: 13
Summary
Findings1. Potential undeclared variable in The diff removes both declaration lines for - integer, dimension(num_procs) :: part_order, part_ord_mpi
- integer, dimension(num_procs) :: proc_bubble_counts
+ integer, allocatable :: proc_bubble_counts(:)
2. Duplicated magic constant The value
A comment says these must stay in sync, but there is no mechanical enforcement. If one is updated without the other, the Python early-error check and the Fortran 3. Minor: spurious blank line in
Improvement Opportunities
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files changed: 14
Summary
Findings1.
|
…only 1,500+ stress-test rounds on CCE 19.0.0 showed zero ICEs with plain dimension(num_species) local arrays in m_chemistry.fpp. Remove all #:if USING_CCE fixed-size array guards, the CCE_MAX_SPECIES Fypp constant, the @:PROHIBIT runtime checks, and the matching Python-side species-count validation in input.py. Simplify the compound AMD guard from (not MFC_CASE_OPTIMIZATION and USING_AMD) or USING_CCE to not MFC_CASE_OPTIMIZATION and USING_AMD with a literal dimension(10) — the AMD workaround is preserved. The -Oipa0 per-file CMake flags for m_bubbles_EL and m_phase_change are kept; those ICEs are confirmed required by 20/20 positive-control rounds and GitHub CI history. Temporarily remove non-CCE jobs from CI (GitHub runners, Phoenix, Frontier-AMD) to focus test bandwidth on the CCE fix branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: 878fddb Files changed: 15
Summary
Findings1. Bug 6 /
|
Re-add the jobs that were temporarily removed for CCE testing: - test.yml: github job (gfortran + Intel ifx), Phoenix matrix entries, Frontier AMD matrix entries - bench.yml: Phoenix NVHPC entries, Frontier AMD entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files changed: 14
Summary
Findings1. The PR replaces the careful diff-based approach with a plain source /opt/intel/oneapi/setvars.sh
printenv >> $GITHUB_ENVThe PR description acknowledges this is a known risk: shell internals with special characters (newlines in values, embedded 2. The PR body explicitly notes that the 3. In the #elif MFC_OpenMP
$:omp_directive
#else
$:cray_noinline_directive ! emits !DIR$ NOINLINEOn Cray CPU-only builds ( 4. The change converts 5. old_macro = (
"#ifdef _CRAYFTN\n#define GPU_ROUTINE(name) !DIR$ INLINEALWAYS name\n"
"#else\n#define GPU_ROUTINE(name) ! routine seq\n#endif"
)The fallback logic is good (raises 6. A spurious blank line was added before the Overall AssessmentThe changes are well-reasoned, carefully scoped, and contain clear explanatory comments for every compiler workaround. The |
When the Cray+ACC GPU_ROUTINE macro patch successfully replaces the broken _CRAYFTN/#ifdef block in pyrometheus-generated m_thermochem.f90, emit a yellow warning so engineers can tell the workaround is active and will notice when pyrometheus eventually fixes the upstream issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: c17653f Changed files
Summary
Findings[CONCERN — The old code deliberately avoided The new code reverts to the simpler: source /opt/intel/oneapi/setvars.sh
printenv >> $GITHUB_ENVThis is the pattern the old code intentionally worked around (github/actions/runner#1189). If Intel ifx CI on ubuntu is re-enabled before merge (as planned), this may trigger GITHUB_ENV parse failures. Recommend either keeping the diff-based filter or verifying this is no longer a problem with the current oneAPI version. [MINOR — On Cray+OpenACC, [MINOR — integer, dimension(num_procs) :: part_order, part_ord_mpi ! still VLA
integer, allocatable :: proc_bubble_counts(:) ! convertedOnly [INFO — if patched == thermochem_code:
if new_macro in thermochem_code:
pass # pyrometheus already emits the correct form; no patch needed
else:
raise common.MFCException(...)The Overall the Fortran-side fixes are technically sound and well-reasoned. The CMake per-file |
…rruption Plain `printenv >> $GITHUB_ENV` after sourcing setvars.sh can corrupt GITHUB_ENV parsing due to shell internals with special characters. Restore the diff-based filter from 2c3590c that was accidentally clobbered by 878fddb when slimming CI for CCE testing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: 24ea0cb Summary
Findings[BLOCKER] [MINOR] integer, dimension(num_procs) :: part_order, part_ord_mpi ! still VLAs
integer, allocatable :: proc_bubble_counts(:) ! fixedOnly [MINOR] [MINOR] Positive Observations
|
- case-optimization: restore Phoenix (acc+omp) and Frontier AMD (omp) entries that were incorrectly narrowed to CCE-only - self: restore Frontier AMD GPU OMP sharding (1/2, 2/2) that was unintentionally removed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: f881888 Changed files
Summary
Findings1.
|
- self/case-optimization: clean: true on checkout - self: build retry max_attempts 3→2, on_retry rm -rf build - self/case-optimization: add Cancel SLURM Jobs step on cancellation - case-optimization: drop retry wrapper on Pre-Build (login node) - CMakeLists.txt: skip -march=native for gcov builds; add -mno-avx512fp16 to avoid binutils <2.38 assembler failures on Granite Rapids CPUs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Changed files
Summary
Findings1.
|
Claude Code ReviewHead SHA: Summary
Findings1. Medium —
|
Resolve conflicts with master PRs MFlowCode#1295 (CI robustness refactor) and MFlowCode#1286 (CCE 19.0.0 workarounds). Keep coverage pruning features while adopting master's submit-job.sh split, build cache, Frontier continue-on-error, and improved SLURM monitoring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
CCE 19.0.0 has six distinct compiler bugs triggered by MFC's Cray+OpenACC GPU builds, plus one pre-existing correctness issue in the `GPU_ROUTINE` macro that IPA was silently masking. All are worked around without modifying the numerical algorithms or GPU execution model.
Bug 1 — InstCombine ICE in `matmul()` (`m_phase_change.fpp`)
CCE 19.0.0's InstCombine pass crashes on `matmul()` inside a GPU kernel.
Fix: Replace `matmul()` with explicit 2×2 scalar arithmetic.
Bug 2 — Uninitialized `FT` in `s_TSat` (`m_phase_change.fpp`)
`huge(FT)` before `FT` was declared caused undefined behavior caught by CCE.
Fix: Use `huge(1.0_wp)` instead.
Bug 3 — IPA `bring_routine_resident` SIGSEGV (`m_phase_change.fpp`)
CCE 19.0.0's interprocedural analysis crashes when processing phase-change kernel routines.
Two sub-approaches combined:
Applies `cray_noinline=True` to 4 routines in `m_phase_change.fpp` and 4 in `m_variables_conversion.fpp`.
Bug 4 — IPA `castIsValid` ICE (`m_bubbles_EL.fpp`)
Complex GPU loops combined with a `dimension(num_procs)` VLA trigger an InstCombine PHI crash during IPA.
Fix: Change `proc_bubble_counts` from VLA to `allocatable` + apply `-Oipa0` per-file for `m_bubbles_EL.fpp` in `CMakeLists.txt` (Cray+OpenACC only).
Bug 5 — Pyrometheus-generated `m_thermochem.f90` missing `!$acc routine seq` on Cray+OpenACC
Pyrometheus emits `!DIR$ INLINEALWAYS name` for Cray but omits `!$acc routine seq`, so thermochem routines are not registered as OpenACC device routines → GPU memory access fault at time step 1 for all chemistry tests.
Fix: Post-process the generated code in `toolchain/mfc/run/input.py` to replace the broken Cray `#ifdef` block with `#define GPU_ROUTINE(name) !$acc routine seq`.
Bug 6 — VLA `dimension(num_species)` ICE in `m_chemistry.fpp`
An earlier version of this PR guarded the 4 VLA locations in `m_chemistry.fpp` with `#:if USING_CCE` to use `dimension(10)` fixed-size arrays. After 1,500+ staging-delete rebuild rounds across all backends (CPU, Cray+OpenACC, Cray+OpenMP) and ±case-optimization on CCE 19.0.0, zero ICEs were observed with plain `dimension(num_species)` arrays.
Fix: Removed all `#:if USING_CCE` guards from `m_chemistry.fpp` (reverting to plain `dimension(num_species)`), removed the `CCE_MAX_SPECIES` Fypp constant and `@:PROHIBIT` runtime checks, and removed the matching Python-side species-count validation in `input.py`. The compound AMD guard `(not MFC_CASE_OPTIMIZATION and USING_AMD) or USING_CCE` is simplified to `not MFC_CASE_OPTIMIZATION and USING_AMD` (the AMD workaround is preserved with a literal `dimension(10)`).
Bug 7 — `cray_inline=True` in `GPU_ROUTINE` was broken on Cray+OpenACC (latent correctness bug)
Before this PR, `cray_inline=True` on Cray+OpenACC emitted only `!DIR$ INLINEALWAYS name` with no `!$acc routine seq`. This means 33 routines across 8 files (`m_bubbles.fpp`, `m_bubbles_EL_kernels.fpp`, `m_compute_cbc.fpp`, `m_sim_helpers.fpp`, `m_qbmm.fpp`, `m_bubbles_EL.fpp`, `m_boundary_common.fpp`, `m_chemistry.fpp`) were not registered as OpenACC device routines on Cray. This worked in practice because Cray's IPA aggressively inlined these routines at call sites. With `-Oipa0` disabled for Bug 4, this inlining path breaks.
Fix: The `cray_inline=True` branch in `GPU_ROUTINE` now correctly emits `!$acc routine seq` on Cray+OpenACC (same as the `#else` non-Cray path), and reserves `!DIR$ INLINEALWAYS` for Cray CPU-only builds. This is the correct behavior per the OpenACC spec.
Files changed
Testing
All 6 previously-failing tests confirmed passing on Frontier with CCE 19.0.0 + OpenACC (SLURM job 4172615):
Performance (CCE 19.0.0 + OpenACC, Frontier)
No measurable regressions from the `-Oipa0` per-file flags and `cray_inline` fix. Benchmark grind times vs master (all differences ≤ 2%, within GPU run-to-run noise of ~5–10%):
Frontier CCE CI fully passing. Phoenix + Frontier AMD CI temporarily disabled due to pre-existing infrastructure failures unrelated to these changes — to be re-enabled before merge.
🤖 Generated with Claude Code