Skip to content

Work around CCE 19.0.0 compiler bugs for Cray+OpenACC builds#1286

Merged
sbryngelson merged 28 commits intoMFlowCode:masterfrom
sbryngelson:fix/cce-cray-inline-routine
Mar 9, 2026
Merged

Work around CCE 19.0.0 compiler bugs for Cray+OpenACC builds#1286
sbryngelson merged 28 commits intoMFlowCode:masterfrom
sbryngelson:fix/cce-cray-inline-routine

Conversation

@sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Mar 3, 2026

Summary

CCE 19.0.0 has six distinct compiler bugs triggered by MFC's Cray+OpenACC GPU builds, plus one pre-existing correctness issue in the `GPU_ROUTINE` macro that IPA was silently masking. All are worked around without modifying the numerical algorithms or GPU execution model.

Bug 1 — InstCombine ICE in `matmul()` (`m_phase_change.fpp`)

CCE 19.0.0's InstCombine pass crashes on `matmul()` inside a GPU kernel.
Fix: Replace `matmul()` with explicit 2×2 scalar arithmetic.

Bug 2 — Uninitialized `FT` in `s_TSat` (`m_phase_change.fpp`)

`huge(FT)` before `FT` was declared caused undefined behavior caught by CCE.
Fix: Use `huge(1.0_wp)` instead.

Bug 3 — IPA `bring_routine_resident` SIGSEGV (`m_phase_change.fpp`)

CCE 19.0.0's interprocedural analysis crashes when processing phase-change kernel routines.

Two sub-approaches combined:

  • Add `cray_noinline=True` parameter to `GPU_ROUTINE` macro (new surgical knob): on Cray+OpenACC emits only `!$acc routine seq` (no `!DIR$ NOINLINE` — that causes ftn-790 and downstream `castIsValid` crash); on Cray CPU emits `!DIR$ NOINLINE`
  • Apply `-Oipa0` per-file for `m_phase_change.fpp` in `CMakeLists.txt` (Cray+OpenACC only)

Applies `cray_noinline=True` to 4 routines in `m_phase_change.fpp` and 4 in `m_variables_conversion.fpp`.

Bug 4 — IPA `castIsValid` ICE (`m_bubbles_EL.fpp`)

Complex GPU loops combined with a `dimension(num_procs)` VLA trigger an InstCombine PHI crash during IPA.
Fix: Change `proc_bubble_counts` from VLA to `allocatable` + apply `-Oipa0` per-file for `m_bubbles_EL.fpp` in `CMakeLists.txt` (Cray+OpenACC only).

Bug 5 — Pyrometheus-generated `m_thermochem.f90` missing `!$acc routine seq` on Cray+OpenACC

Pyrometheus emits `!DIR$ INLINEALWAYS name` for Cray but omits `!$acc routine seq`, so thermochem routines are not registered as OpenACC device routines → GPU memory access fault at time step 1 for all chemistry tests.
Fix: Post-process the generated code in `toolchain/mfc/run/input.py` to replace the broken Cray `#ifdef` block with `#define GPU_ROUTINE(name) !$acc routine seq`.

Bug 6 — VLA `dimension(num_species)` ICE in `m_chemistry.fpp`

An earlier version of this PR guarded the 4 VLA locations in `m_chemistry.fpp` with `#:if USING_CCE` to use `dimension(10)` fixed-size arrays. After 1,500+ staging-delete rebuild rounds across all backends (CPU, Cray+OpenACC, Cray+OpenMP) and ±case-optimization on CCE 19.0.0, zero ICEs were observed with plain `dimension(num_species)` arrays.

Fix: Removed all `#:if USING_CCE` guards from `m_chemistry.fpp` (reverting to plain `dimension(num_species)`), removed the `CCE_MAX_SPECIES` Fypp constant and `@:PROHIBIT` runtime checks, and removed the matching Python-side species-count validation in `input.py`. The compound AMD guard `(not MFC_CASE_OPTIMIZATION and USING_AMD) or USING_CCE` is simplified to `not MFC_CASE_OPTIMIZATION and USING_AMD` (the AMD workaround is preserved with a literal `dimension(10)`).

Bug 7 — `cray_inline=True` in `GPU_ROUTINE` was broken on Cray+OpenACC (latent correctness bug)

Before this PR, `cray_inline=True` on Cray+OpenACC emitted only `!DIR$ INLINEALWAYS name` with no `!$acc routine seq`. This means 33 routines across 8 files (`m_bubbles.fpp`, `m_bubbles_EL_kernels.fpp`, `m_compute_cbc.fpp`, `m_sim_helpers.fpp`, `m_qbmm.fpp`, `m_bubbles_EL.fpp`, `m_boundary_common.fpp`, `m_chemistry.fpp`) were not registered as OpenACC device routines on Cray. This worked in practice because Cray's IPA aggressively inlined these routines at call sites. With `-Oipa0` disabled for Bug 4, this inlining path breaks.
Fix: The `cray_inline=True` branch in `GPU_ROUTINE` now correctly emits `!$acc routine seq` on Cray+OpenACC (same as the `#else` non-Cray path), and reserves `!DIR$ INLINEALWAYS` for Cray CPU-only builds. This is the correct behavior per the OpenACC spec.

Files changed

  • `CMakeLists.txt` — per-file `-Oipa0` for `m_bubbles_EL` and `m_phase_change` (Cray+OpenACC only)
  • `src/common/include/parallel_macros.fpp` — new `cray_noinline` parameter + fix `cray_inline` path for Cray+OpenACC + mutual-exclusivity assert
  • `src/common/m_phase_change.fpp` — matmul fix, FT init fix, `cray_noinline=True` on 4 routines, caller-side `!DIR$ NOINLINE` guards
  • `src/common/m_variables_conversion.fpp` — `cray_noinline=True` on 4 routines
  • `src/simulation/m_bubbles_EL.fpp` — `proc_bubble_counts` changed to `allocatable`
  • `src/common/m_chemistry.fpp` — removed `#:if USING_CCE` VLA guards and `CCE_MAX_SPECIES` constant (Bug 6 resolved by removal, not addition, after 1,500+ stress-test rounds)
  • `toolchain/mfc/run/input.py` — post-process pyrometheus thermochem to fix `GPU_ROUTINE` macro for Cray+OpenACC; removed `CCE_MAX_SPECIES` species-count validation (no longer needed)

Testing

All 6 previously-failing tests confirmed passing on Frontier with CCE 19.0.0 + OpenACC (SLURM job 4172615):

  • `F5493DA5` — 2D Bubbles (adv_n=T, adap_dt=T) ✓
  • `7912AB81` — 3D Bubbles (adv_n=T, adap_dt=T) ✓
  • `5DCF300C` — 1D Chemistry Perfect Reactor ✓
  • `E8372F50` — 3D Chemistry Perfect Reactor ✓
  • `2BDE2018` — 1D Chemistry Inert Shocktube (RS1) ✓
  • `F8ADA51B` — 1D Chemistry Inert Shocktube (RS2) ✓

Performance (CCE 19.0.0 + OpenACC, Frontier)

No measurable regressions from the `-Oipa0` per-file flags and `cray_inline` fix. Benchmark grind times vs master (all differences ≤ 2%, within GPU run-to-run noise of ~5–10%):

Case PR grind Master grind Δ
5eq_rk3_weno3_hllc 0.441 0.448 +1.5% (PR faster)
hypo_hll 0.371 0.364 −2.0% (noise)
ibm 1.425 1.421 −0.3% (noise)
viscous_weno5_sgb_acoustic 0.944 0.937 −0.7% (noise)
igr 0.528 N/A (master SIGTERM pre-existing)

Frontier CCE CI fully passing. Phoenix + Frontier AMD CI temporarily disabled due to pre-existing infrastructure failures unrelated to these changes — to be re-enabled before merge.

Note on GitHub-hosted CI (ubuntu + macOS): The `github` job (gfortran + Intel ifx) has been temporarily removed from `test.yml` on this branch to focus CI bandwidth on the CCE fix. This will be restored before merge. The gfortran and Intel ifx compiler gates will be re-enabled at that point.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 3, 2026 00:02

This comment was marked as outdated.

coderabbitai[bot]

This comment was marked as outdated.

@sbryngelson sbryngelson changed the title Fix CCE 19.0.0 optcg ICE: GPU_ROUTINE cray_inline emits both !DIR$ INLINEALWAYS and ! routine seq Fix CCE 19.0.0 IPA crash: add cray_noinline parameter to GPU_ROUTINE Mar 3, 2026
sbryngelson pushed a commit to sbryngelson/MFC that referenced this pull request Mar 4, 2026
…: add -Oipa0

m_phase_change.fpp triggers the same CCE 19.0.0 bring_routine_resident
SIGSEGV during IPA as m_bubbles_EL. Caller-side !DIR$ NOINLINE directives
(commit 628a046) were insufficient. Add -Oipa0 per-file flag to disable
IPA entirely for m_phase_change (same approach proven to work for
m_bubbles_EL). Consolidate both files in one set_source_files_properties
call. See PR MFlowCode#1286.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spencer Bryngelson added 2 commits March 5, 2026 02:47
Three distinct CCE 19.0.0 compiler bugs required fixes:

Bug 1: InstCombine ICE in matmul() in m_phase_change.fpp
- Replace matmul() with explicit 2x2 arithmetic

Bug 2: IPA bring_routine_resident SIGSEGV in m_phase_change.fpp
- Add -Oipa0 per-file in CMakeLists.txt (Cray+OpenACC only)
- Use cray_noinline=True on 4 GPU_ROUTINE calls in m_phase_change.fpp
  and 4 in m_variables_conversion.fpp

Bug 3: IPA castIsValid ICE in m_bubbles_EL.fpp
- Change proc_bubble_counts from VLA to allocatable
- Add -Oipa0 per-file in CMakeLists.txt (Cray+OpenACC only)

Bug 4: m_chemistry.fpp VLA ICE in case-optimized pre_process builds
- Guard 4 dimension(num_species) local arrays with USING_CCE

Bug 5: Pyrometheus GPU_ROUTINE macro missing !acc routine seq on Cray+ACC
- Post-process generated m_thermochem.f90 in toolchain/mfc/run/input.py
  to replace the broken Cray INLINEALWAYS-only macro with plain
  #define GPU_ROUTINE(name) !acc routine seq

Also fix uninitialized FT in s_TSat (use huge(1.0_wp) not huge(FT)).

See PR MFlowCode#1286.
@sbryngelson sbryngelson force-pushed the fix/cce-cray-inline-routine branch from 6f97c9f to 1aa4cf5 Compare March 5, 2026 07:50
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 5, 2026
@github-actions
Copy link

github-actions bot commented Mar 6, 2026

Claude Code Review

Head SHA: 61924d8
Files changed: 15

  • .github/scripts/run_monitored_slurm_job.sh (new)
  • .github/workflows/bench.yml
  • .github/workflows/frontier/build.sh
  • .github/workflows/frontier/submit.sh
  • .github/workflows/phoenix/bench.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • CMakeLists.txt
  • src/common/include/parallel_macros.fpp
  • src/common/m_chemistry.fpp
  • src/common/m_phase_change.fpp
  • src/common/m_variables_conversion.fpp
  • src/simulation/m_bubbles_EL.fpp
  • toolchain/mfc/run/input.py

Summary

  • Workarounds for 7 distinct CCE 19.0.0 compiler bugs (InstCombine ICEs, IPA SIGSEGV, VLA crashes, and a pre-existing Cray+OpenACC GPU_ROUTINE correctness bug now exposed).
  • Latent cray_inline=True correctness bug fixed — on Cray+OpenACC those routines were never registered as OpenACC device routines; was only working because IPA inlined them.
  • New run_monitored_slurm_job.sh wrapper adds sacct-based recovery when the SLURM monitor is killed mid-job.
  • Phoenix and Frontier AMD CI temporarily disabled due to pre-existing infrastructure failures.
  • All 6 previously-failing Frontier CCE tests confirmed passing (SLURM job 4172615).

Findings

1. src/simulation/m_bubbles_EL.fpp:1535part_order and part_ord_mpi silently removed

-        integer, dimension(num_procs) :: part_order, part_ord_mpi
-        integer, dimension(num_procs) :: proc_bubble_counts
+        integer, allocatable :: proc_bubble_counts(:)

part_order and part_ord_mpi are deleted with no allocatable replacement and no mention in the PR description. The PR description covers only proc_bubble_counts. If these variables are used anywhere in s_write_restart_lag_bubbles (not shown in the diff), this would be a silent compile error caught only on the affected paths. Since Frontier tests pass they are likely dead code, but this should be confirmed and noted in the commit.

Action: Please confirm part_order / part_ord_mpi are dead code (never referenced after declaration) and add a comment or remove them explicitly.


2. CMakeLists.txt:644-647set_source_files_properties path for m_phase_change

"${CMAKE_BINARY_DIR}/fypp/simulation/m_phase_change.fpp.f90"

m_phase_change.fpp lives in src/common/, not src/simulation/. The Fypp preprocessed output path depends on how HANDLE_SOURCES organises files. If the preprocessed copy for the simulation target is placed under fypp/simulation/, this is correct. If it is placed under fypp/common/ (with a symlink or shared object), the property silently applies to nothing and -Oipa0 is never passed, leaving Bug 3 unmitigated on Cray+OpenACC. The test suite passing on Frontier suggests the path is correct, but it is worth an explicit comment confirming the layout.


3. toolchain/mfc/run/input.py:116-130 — Pyrometheus patch is fragile

The string replacement matches an exact multi-line #ifdef _CRAYFTN block in the pyrometheus-generated m_thermochem.f90. Any change in whitespace, line-endings, or ordering by an upstream pyrometheus update will silently skip the patch (and new_macro won't be present either), triggering the exception. The error guard is good and will catch drift at input-generation time rather than at runtime, so this is acceptable as a short-term workaround. A TODO comment pointing to the pyrometheus upstream issue would help future maintainers know when this can be removed.


4. src/common/m_chemistry.fppCCE_MAX_SPECIES = 10 is a hard chemistry limit for Cray

The fixed-size workaround limits all Cray builds (not just case-optimized pre_process) to ≤ 10 species. The @:PROHIBIT guards will abort at runtime rather than silently truncate, which is correct. However, GRI-Mech 3.0 and most real combustion mechanisms have far more than 10 species, so Cray chemistry users hitting this limit will see an opaque abort unless they read the message carefully. A more descriptive error message (e.g. pointing to the CCE 19 bug tracker entry or the PR) would improve debuggability.

Also note: @:PROHIBIT is placed as executable code in subroutines that are called inside GPU parallel loops (e.g. s_compute_reaction_source). On GPU builds the abort path may not be reachable at device execution, but the check fires on the CPU path through the loop body before GPU_PARALLEL_LOOP. Confirm this is the intended check location and does not cause issues on GPU kernels.


5. src/common/include/parallel_macros.fpp:50-73cray_noinline on non-Cray CPU builds emits nothing

Per the inline comment this is intentional: !DIR$ NOINLINE is a Cray-specific directive and no equivalent is needed on gfortran/nvfortran/ifx. This is correct, but worth verifying that the missing !DIR$ NOINLINE on non-Cray does not cause correctness issues via inlining on other compilers (the IPA crash is CCE-specific, so this should be fine).


6. CI coverage gap (.github/workflows/test.yml)

Phoenix (NVHPC) and Frontier AMD CI are disabled for this PR. The PR description states these failures are pre-existing and unrelated, and that both will be re-enabled before merge. Please track the re-enablement — merging with reduced CI coverage means cray_inline fix impact on AMD OpenMP (--gpu mp) and NVHPC builds is not fully validated by the test suite.


Minor

  • CMakeLists.txt:400: extra blank line added between HANDLE_SOURCES calls — harmless.
  • run_monitored_slurm_job.sh:31: sleep 30 is necessary for SLURM epilog but could time out if the epilog is slow on congested nodes. Fine as-is.

@github-actions
Copy link

github-actions bot commented Mar 7, 2026

Claude Code Review

Head SHA: ac28127
Files changed: 15 — CMakeLists.txt, src/common/include/parallel_macros.fpp, src/common/m_chemistry.fpp, src/common/m_phase_change.fpp, src/common/m_variables_conversion.fpp, src/simulation/m_bubbles_EL.fpp, toolchain/mfc/run/input.py, and 8 CI/workflow files


Summary

  • Introduces targeted workarounds for 7 distinct CCE 19.0.0 compiler bugs (InstCombine ICE, IPA SIGSEGV/assertion, Pyrometheus missing ! routine seq, VLA ICE) in Cray+OpenACC GPU builds.
  • Fixes a latent correctness bug in GPU_ROUTINE (cray_inline=True) where Cray+OpenACC builds never emitted ! routine seq, relying silently on IPA inlining.
  • Adds a new cray_noinline knob to GPU_ROUTINE, correctly gated behind mutual-exclusivity asserts.
  • Applies per-file -Oipa0 in CMake only for Cray AND NOT MFC_OpenMP, preserving IPA for Cray+OpenMP thermochem inlining.
  • Phoenix (GT) and AMD Frontier CI are temporarily disabled — PR body states they will be re-enabled before merge.

Findings

1. part_order / part_ord_mpi VLAs remain in m_bubbles_EL.fpp (line 1533)
proc_bubble_counts was converted to allocatable to fix Bug 4, but the two adjacent dimension(num_procs) VLAs part_order and part_ord_mpi are unchanged. If a future CCE version triggers the same IPA/InstCombine ICE on these, another patch will be needed. Not a correctness bug now, but worth noting.

2. CCE_MAX_SPECIES = 10 is duplicated across Fortran and Python (must stay in sync)
The constant is defined in src/common/m_chemistry.fpp (Fypp, line ~9) and again in toolchain/mfc/run/input.py (line ~97). A mismatch would cause the Python early-fail check to be wrong while the Fortran PROHIBIT still fires. A comment in both places cross-references this (good), but it's an ongoing maintenance burden. Low risk given good documentation, but worth a brief mention.

3. Pyrometheus patch in input.py is a literal string match (fragile, but guarded)
The replacement of the broken Cray #ifdef block is done with str.replace on an exact multiline string. If pyrometheus ever changes whitespace or ordering, the patch silently no-ops. However, the fallback logic correctly raises MFCException if neither the patched nor the expected-new form is found (toolchain/mfc/run/input.py, ~line 130). This is the right defensive approach; just ensure the exception path is tested if pyrometheus is ever upgraded.

4. Caller-side !DIR$ NOINLINE directives in m_phase_change.fpp (#ifdef MFC_OpenACC, line ~110)
These are guarded correctly for Cray+OpenACC only. The PR notes that !DIR$ NOINLINE at the callee site causes ftn-790 on Cray+OpenACC (hence cray_noinline emits only ! routine seq at the callee), but here they are applied at the caller site. The test results confirm this combination works; just worth documenting why caller-side is safe where callee-side is not, for future maintainers.

5. Phoenix and AMD CI disabled at PR open
The PR body says these will be re-enabled before merge. This should be a merge gate — the CI matrix changes in test.yml and bench.yml leave these marked with # TEMPORARILY DISABLED comments. Confirm this is restored before merging.


Minor / Non-blocking

  • Extra blank line added at CMakeLists.txt line 400 (cosmetic).
  • The #ifdef MFC_OpenACC (caller-side in m_phase_change.fpp) vs #if MFC_OpenACC (in parallel_macros.fpp) style inconsistency is harmless since MFC_OpenACC is always defined as a preprocessor symbol (not given a value).
  • The s_compute_pressure and s_convert_species_to_mixture_variables_acc hot paths now use cray_noinline=True instead of cray_inline=True. On non-Cray GPU builds this has no effect (acc directive is unchanged). On Cray+OpenACC, inlining is disabled; the benchmark table in the PR body shows no measurable regression.

Overall this is a well-targeted and well-documented set of compiler workarounds. The code changes are minimal and surgical. Main ask: confirm Phoenix and AMD CI are re-enabled before merge.

@github-actions
Copy link

github-actions bot commented Mar 7, 2026

Claude Code Review

Head SHA: 4631c9b
Files changed: 16

Changed files
  • .github/scripts/run_monitored_slurm_job.sh (new)
  • .github/workflows/bench.yml
  • .github/workflows/frontier/build.sh
  • .github/workflows/frontier/submit.sh
  • .github/workflows/phoenix/bench.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • CMakeLists.txt
  • src/common/include/parallel_macros.fpp
  • src/common/m_chemistry.fpp
  • src/common/m_phase_change.fpp
  • src/common/m_variables_conversion.fpp
  • src/simulation/m_bubbles_EL.fpp
  • toolchain/mfc/build.py
  • toolchain/mfc/run/input.py

Summary

  • Works around 7 distinct CCE 19.0.0 compiler bugs (InstCombine ICE, IPA SIGSEGV, VLA ICE, pyrometheus GPU_ROUTINE emission) without modifying numerical algorithms.
  • Fixes a latent cray_inline=True correctness bug where 33+ routines were not registered as OpenACC device routines on Cray, relying on IPA inlining as an accidental fix.
  • Adds run_monitored_slurm_job.sh to harden CI against transient SLURM monitor kills.
  • Reorganizes toolchain/mfc/build.py significantly (-622/+78 lines, removes progress bar infrastructure) as an incidental cleanup in the same PR.
  • All 6 previously-failing Frontier CCE tests confirmed passing per PR description.

Findings

[Medium] src/simulation/m_bubbles_EL.fpp:1535 — Removed part_order and part_ord_mpi without replacement

The diff removes the line declaring two VLA local arrays that were on the same declaration line as proc_bubble_counts:

-        integer, dimension(num_procs) :: part_order, part_ord_mpi
-        integer, dimension(num_procs) :: proc_bubble_counts
+        integer, allocatable :: proc_bubble_counts(:)

part_order and part_ord_mpi have no allocatable replacements. If they are referenced anywhere in s_write_restart_lag_bubbles, this would be a compile error under implicit none. Since tests pass on Frontier/CCE, they are presumably unused dead code — but please confirm they are not referenced anywhere in the function body. If so, their removal is correct and worth explicitly noting in the commit message as a dead-code cleanup.


[Low] CMakeLists.txt:641-Oipa0 applied to Cray CPU builds as well

The guard is:

if (CMAKE_Fortran_COMPILER_ID STREQUAL "Cray" AND NOT MFC_OpenMP)

This disables IPA for m_bubbles_EL and m_phase_change on Cray CPU builds, not just Cray+OpenACC. The PR description focuses on Cray+OpenACC as the trigger, but CPU builds are also affected. Per-file IPA disable on CPU has negligible practical impact, but the condition could be tightened to Cray AND MFC_OpenACC if the intent is OpenACC-only. The comment in the file acknowledges this (it says "Cray+OpenACC and Cray CPU"), so it is intentional — just worth confirming the CPU case was tested or is acceptable.


[Low] Duplicated CCE_MAX_SPECIES = 10 magic number across two files

The limit is defined independently in:

  • src/common/m_chemistry.fpp (#:set CCE_MAX_SPECIES = 10)
  • toolchain/mfc/run/input.py (CCE_MAX_SPECIES = 10)

Comments note they must match. This is a maintenance risk: if someone changes one without the other, the Python check and Fortran PROHIBIT diverge silently for CPU builds (only GPU builds raise a hard error in Python). Consider exporting this constant from a single location (e.g., a shared config or the case.fpp/Fypp variable passed down), or at minimum add a test that asserts the two values agree. The current state is workable but fragile.


[Info] Pyrometheus patch in toolchain/mfc/run/input.py:118

The string-replacement approach is fragile but well-defended: it raises MFCException if the pattern is not found and the correct form is also absent. The only gap is if pyrometheus changes the macro text in a way that partially matches — but the three-case guard (patched / already correct / error) covers the realistic scenarios. Good defensive coding. Add a comment pointing to the upstream pyrometheus issue/PR so the workaround can be removed when fixed upstream.


[Info] toolchain/mfc/build.py rewrite (-622/+78)

This large refactor removes the progress-bar infrastructure and is unrelated to CCE bug fixes. Bundling it here makes the diff harder to review. It should ideally be a separate PR, but if the functionality is preserved and CI is green it is acceptable. Just note that this changes the ./mfc.sh build user experience (no more per-file progress output in interactive TTY mode) — ensure this is intentional and documented.


Overall

The Fortran-side fixes are correct and well-reasoned. The GPU_ROUTINE cray_inline correctness fix (Bug 7) is the most significant: it repairs a latent semantic error that was accidentally masked by IPA. The only actionable item before merge is confirming part_order/part_ord_mpi are truly unused.

🤖 Generated with Claude Code

@github-actions
Copy link

github-actions bot commented Mar 7, 2026

Claude Code Review

Head SHA: eaecef11f9687567887807f85883ca8c024e9125

Files changed: 16

.github/scripts/run_monitored_slurm_job.sh
.github/workflows/bench.yml
.github/workflows/frontier/build.sh
.github/workflows/frontier/submit.sh
.github/workflows/phoenix/bench.sh
.github/workflows/phoenix/submit.sh
.github/workflows/phoenix/test.sh
.github/workflows/test.yml
CMakeLists.txt
src/common/include/parallel_macros.fpp
src/common/m_chemistry.fpp
src/common/m_phase_change.fpp
src/common/m_variables_conversion.fpp
src/simulation/m_bubbles_EL.fpp
toolchain/mfc/build.py
toolchain/mfc/run/input.py

Summary

  • Workarounds six distinct CCE 19.0.0 compiler bugs across Cray+OpenACC GPU builds, plus one latent correctness bug in the GPU_ROUTINE macro (cray_inline path not emitting \! routine seq on Cray+OpenACC).
  • Adds a new cray_noinline parameter to GPU_ROUTINE with a mutual-exclusivity assert; fixes the cray_inline path to correctly emit \! routine seq on Cray+OpenACC.
  • Applies targeted per-file -Oipa0 for m_bubbles_EL.fpp and m_phase_change.fpp (Cray only, not OpenMP) in CMake.
  • Posts a runtime patch to pyrometheus-generated m_thermochem.f90 so Cray+ACC builds correctly register thermochem routines as device routines.
  • Phoenix and Frontier AMD CI matrix entries are temporarily disabled; PR description states they will be re-enabled before merge.

Findings

1. Duplicate CCE_MAX_SPECIES constant — no automated sync enforcement

  • src/common/m_chemistry.fpp:9: #:set CCE_MAX_SPECIES = 10
  • toolchain/mfc/run/input.py:95: CCE_MAX_SPECIES = 10

The PR acknowledges this with a comment, and the Fortran side has @:PROHIBIT guards, so a mismatch would produce a runtime abort rather than silent wrong behavior. However, there is no compile-time or CI gate that enforces the two constants are equal. Consider extracting the constant to a single source (e.g., a CMake variable propagated to both Fypp and Python via case.fpp) to eliminate the drift risk.

2. part_order and part_ord_mpi remain as dimension(num_procs) VLAs in m_bubbles_EL.fpp:1532
Only proc_bubble_counts was changed to allocatable. part_order and part_ord_mpi are still VLAs of the same shape (dimension(num_procs)). With -Oipa0 applied to this file they won't hit the current crash, but if a future compiler version or a different code path hits a VLA-related ICE, the remaining two may need the same treatment. This is not a blocker, but worth noting if CCE continues to have VLA sensitivity.

3. cray_noinline + caller-site \!DIR$ NOINLINE asymmetry in m_phase_change.fpp:107–114
The PR correctly notes that routine-level \!DIR$ NOINLINE inside GPU_ROUTINE causes ftn-790 on Cray+OpenACC, so cray_noinline=True in the macro emits only \! routine seq for that backend. The separate caller-site directives in s_relaxation_solver_k (\!DIR$ NOINLINE s_infinite_pt_relaxation_k etc.) are the actual IPA suppression mechanism for OpenACC. This split is intentional and explained in the PR, but is unusual enough that a comment at the call-site block pointing to Bug 3 in the PR description would prevent future confusion.

4. Pyrometheus patch in input.py — fragile string match
toolchain/mfc/run/input.py:118–140: The patch replaces a specific multi-line string literal in pyrometheus-generated output. The error guard (patched == thermochem_code and new_macro not in thermochem_code → raise) is good. However, minor whitespace changes in a future pyrometheus release would silently skip the patch and fall into the "already correct" branch rather than correctly detecting an unapplied patch, because new_macro in thermochem_code would be false but patched == thermochem_code would be true (triggering the error — actually this would raise). Re-reading: the logic is correct; the error case is reached if the string match fails AND the new form is not present. No action needed, but the comment about removing this patch once pyrometheus is fixed upstream is important to track.

5. run_monitored_slurm_job.shsacct may not be available on all nodes
.github/scripts/run_monitored_slurm_job.sh:30: The script uses sacct ... || echo "UNKNOWN" to handle sacct failure gracefully, which is good. The fallback to UNKNOWN means a runner without sacct will always take the error path and report failure. This is the safe behavior, but if Frontier login nodes ever lack sacct the CI step would falsely fail. Low risk given HPC environments, but documented.


No blocking issues found. The implementation is technically sound, well-documented, and the workarounds are appropriately scoped (per-file -Oipa0, not global). Findings 1 and 3 are the most actionable before merge.

@sbryngelson sbryngelson force-pushed the fix/cce-cray-inline-routine branch from eaecef1 to ac28127 Compare March 7, 2026 17:34
@github-actions
Copy link

github-actions bot commented Mar 7, 2026

Claude Code Review

Head SHA: ac28127
Files changed: 15 (CMakeLists.txt, parallel_macros.fpp, m_phase_change.fpp, m_variables_conversion.fpp, m_bubbles_EL.fpp, m_chemistry.fpp, toolchain/mfc/run/input.py, and 8 CI/workflow files)

Summary

  • Addresses 7 distinct CCE 19.0.0 compiler bugs (ICEs, SIGSEGV, uninitialized variables, missing OpenACC routine registration) for Cray+OpenACC GPU builds.
  • The cray_inline=True fix (Bug 7) is a latent correctness bug that silently passed because Cray IPA was masking missing ! routine seq annotations — the fix is correct and important.
  • Workarounds are appropriately scoped (per-file -Oipa0, Fypp guards, string-patching pyrometheus output) without disturbing numerical algorithms or GPU execution model.
  • CI test matrix temporarily narrowed (Phoenix GT and Frontier AMD disabled); PR description notes these should be re-enabled before merge — that step must not be forgotten.

Findings

1. Untracked removal of part_order / part_ord_mpi in m_bubbles_EL.fpp
File: src/simulation/m_bubbles_EL.fpp, declaration site in s_write_restart_lag_bubbles

The diff removes two VLA declarations (part_order, part_ord_mpi) that are not mentioned anywhere in the PR description. Only proc_bubble_counts was identified as the trigger for the Bug 4 ICE and given an allocatable replacement. If part_order/part_ord_mpi are referenced anywhere in the subroutine body not shown in the diff, their silent removal is a correctness bug. Please confirm these were genuinely unused dead code, or add a comment explaining the removal.

2. Duplicated CCE_MAX_SPECIES = 10 magic constant with no enforcement
Files: src/common/m_chemistry.fpp:9 (Fypp #:set CCE_MAX_SPECIES = 10) and toolchain/mfc/run/input.py:97 (Python CCE_MAX_SPECIES = 10)

The comment says "Must match the Fypp constant" but there is no build-time or CI check to enforce this. A future patch to one location that misses the other will silently diverge. Consider extracting this value to a single source of truth (e.g. a shared config, or at minimum a clearer cross-file comment with a test that compares them). As-is, this is a maintainability risk.

3. Temporarily disabled CI must be re-enabled before merge
Files: .github/workflows/test.yml, .github/workflows/bench.yml

Phoenix GT (all 3 test jobs + 3 bench jobs) and Frontier AMD (3 test + 1 bench jobs) are disabled with a comment marker. The PR description says these will be re-enabled before merge, but they are currently absent from the matrix in this PR. If they are not restored before merge, the CI coverage gap becomes permanent until a follow-up PR explicitly re-adds them.

4. -Oipa0 condition also applies to Cray CPU builds
File: CMakeLists.txt:647 — condition CMAKE_Fortran_COMPILER_ID STREQUAL "Cray" AND NOT MFC_OpenMP

The comment in the CMakeLists block says "Applied to Cray+OpenACC and Cray CPU" but does not explain why Cray CPU builds need -Oipa0 (only the Cray+ACC ICE is described for these two files). If Cray CPU builds do not exhibit this crash, the flag is unnecessary overhead. If they do, the comment should say so. Minor, but worth clarifying.

5. Pyrometheus string-patch fragility (low severity — handled correctly)
File: toolchain/mfc/run/input.py:121–137

The macro-substitution approach is inherently fragile (whitespace/line-ending sensitivity), but the code correctly detects all three cases (patch applied, already correct, unexpected format) and raises MFCException on failure. The existing comment noting this should be removed once pyrometheus upstream fixes it is appropriate. No action required, just confirming the error handling is solid.


Minor/Nits

  • m_phase_change.fpp:727: The comment that Fortran .or. is not short-circuit is technically implementation-defined (not guaranteed by the standard). The initialization of FT = huge(1.0_wp) is unconditionally correct regardless, but the comment could be more precise ("not guaranteed to be short-circuit" rather than "not short-circuit").
  • The new run_monitored_slurm_job.sh script has a hardcoded sleep 30 for SLURM epilog settling. This is pragmatic but worth noting if epilog times vary significantly between clusters.

Overall this is a well-documented, well-scoped set of compiler bug workarounds. The most important pre-merge items are #1 (verify part_order/part_ord_mpi removal is intentional) and #3 (restore disabled CI before merge).

@MFlowCode MFlowCode deleted a comment from coderabbitai bot Mar 7, 2026
@MFlowCode MFlowCode deleted a comment from coderabbitai bot Mar 7, 2026
Restore all runners that were temporarily disabled to speed up
initial CI validation of the CCE fix.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 7, 2026

Claude Code Review

Head SHA: 8db880774a29a16a57b9a20be0c6c70685d0e965

Files changed: 13

  • .github/scripts/run_monitored_slurm_job.sh (new)
  • .github/workflows/frontier/build.sh
  • .github/workflows/frontier/submit.sh
  • .github/workflows/phoenix/bench.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • CMakeLists.txt
  • src/common/include/parallel_macros.fpp
  • src/common/m_chemistry.fpp
  • src/common/m_phase_change.fpp
  • src/common/m_variables_conversion.fpp
  • src/simulation/m_bubbles_EL.fpp
  • toolchain/mfc/run/input.py

Summary

  • Surgical CCE 19.0.0 compiler-bug workaround PR targeting Cray+OpenACC builds; 7 distinct bugs addressed without modifying numerical algorithms.
  • New cray_noinline knob in GPU_ROUTINE macro correctly emits ! routine seq on Cray+OpenACC and !DIR$ NOINLINE on Cray CPU-only, filling a gap in the original macro design.
  • cray_inline=True latent correctness bug (Bug 7) fixed: 33 routines across 8 files were silently missing ! routine seq on Cray+OpenACC; this was masked by IPA inlining and now breaks with -Oipa0.
  • Per-file -Oipa0 applied to exactly two files (m_bubbles_EL, m_phase_change) on Cray+OpenACC only (not OpenMP), with a clear comment explaining why OpenMP is excluded.
  • CI/workflow changes (resilient SLURM monitor, rm -rf build replacing cache, updated account/partition) are independent operational improvements.

Findings

1. Potential undeclared variable in src/simulation/m_bubbles_EL.fpp (MPI builds)

The diff removes both declaration lines for part_order, part_ord_mpi, and proc_bubble_counts, then adds back only proc_bubble_counts as allocatable:

-        integer, dimension(num_procs) :: part_order, part_ord_mpi
-        integer, dimension(num_procs) :: proc_bubble_counts
+        integer, allocatable :: proc_bubble_counts(:)

part_order and part_ord_mpi disappear from declarations. If these variables are referenced anywhere in the s_write_restart_lag_bubbles function body (likely inside the #ifdef MPI section — which ubuntu/macOS CI may not compile), MPI builds will fail to compile. The Frontier run was the real MPI test; if that passed with these variables gone, they must be dead code — but this is not obvious from the diff alone. Please confirm part_order / part_ord_mpi are not referenced in the function body, or add their removal explicitly to the diff.

2. Duplicated magic constant CCE_MAX_SPECIES = 10 across two files

The value 10 is hardcoded independently in:

  • src/common/m_chemistry.fpp: #:set CCE_MAX_SPECIES = 10
  • toolchain/mfc/run/input.py: CCE_MAX_SPECIES = 10

A comment says these must stay in sync, but there is no mechanical enforcement. If one is updated without the other, the Python early-error check and the Fortran @:PROHIBIT will diverge silently. This is an acceptable pragmatic workaround for a compiler bug, but documenting the sync requirement more prominently (e.g., a note in both files pointing to the other) would reduce future risk.

3. Minor: spurious blank line in CMakeLists.txt

CMakeLists.txt line 399 adds an empty line before the # MFC_SETUP_TARGET comment. Negligible, but can be cleaned up.


Improvement Opportunities

  • The run_monitored_slurm_job.sh sleep 30 is appropriate here (SLURM epilog latency) and the comment explains it well. No change needed, but if the Frontier epilog is consistently fast, the sleep duration could be reduced in a follow-up.
  • The pyrometheus patch in input.py uses exact string matching for old_macro. If pyrometheus ever adjusts whitespace or newline conventions in its output, the patch will silently no-op (caught by the if patched == thermochem_code guard only when new_macro is also absent). The fallback logic is correct, but the guard could additionally warn if only a partial match is found.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 7, 2026

Claude Code Review

Head SHA: 4e6482d281860e425d8062441077c45caf5aae0d

Files changed: 14

  • .github/scripts/run_monitored_slurm_job.sh (new)
  • .github/workflows/bench.yml
  • .github/workflows/frontier/build.sh, submit.sh
  • .github/workflows/phoenix/bench.sh, submit.sh, test.sh
  • CMakeLists.txt
  • src/common/include/parallel_macros.fpp
  • src/common/m_chemistry.fpp
  • src/common/m_phase_change.fpp
  • src/common/m_variables_conversion.fpp
  • src/simulation/m_bubbles_EL.fpp
  • toolchain/mfc/run/input.py

Summary

  • Workarounds 7 distinct CCE 19.0.0 compiler bugs (InstCombine ICEs, IPA SIGSEGV, missing OpenACC device-routine registration) without changing numerical algorithms or GPU execution semantics.
  • Fixes a latent correctness bug in GPU_ROUTINE where cray_inline=True on Cray+OpenACC was emitting only !DIR$ INLINEALWAYS with no ! routine seq, relying on IPA to cover the gap.
  • Adds cray_noinline parameter to GPU_ROUTINE as a surgical knob to suppress IPA for specific routines on Cray while still emitting correct ! routine seq for OpenACC device registration.
  • Per-file -Oipa0 is applied to m_bubbles_EL and m_phase_change on Cray+non-OpenMP builds; performance benchmarks show no measurable regressions (≤2%, within noise).
  • Pyrometheus-generated m_thermochem.f90 is post-processed to patch the broken Cray+OpenACC GPU_ROUTINE macro, with a guard to detect if the upstream format changes.

Findings

1. toolchain/mfc/run/input.py — CCE_MAX_SPECIES hard error triggers on all GPU builds, not just Cray

directive_str is not None is True for any GPU build — OpenACC on NVIDIA, OpenMP on AMD flang, etc. — not only Cray. An NVIDIA NVHPC + OpenACC build with an 11-species mechanism will hit this hard error even though the Fortran-side guard is inside #:if USING_CCE and only constrains Cray compilers. The check should additionally test whether the Cray compiler is in use (e.g. inspect the configured Fortran compiler or expose USING_CCE to the Python toolchain). As written, large-mechanism GPU runs on non-Cray clusters are silently broken by this guard.

2. src/simulation/m_bubbles_EL.fpp — two sibling VLAs remain alongside the converted array

The IPA ICE per Bug 4 is attributed to proc_bubble_counts as a VLA, yet part_order and part_ord_mpi share the same dimension(num_procs) pattern in the same subroutine. The whole-file -Oipa0 flag makes this safe for now, but if the per-file flag is ever relaxed, these two could re-trigger the ICE. Converting them to allocatable would make the fix self-contained regardless of CMake flags.

3. toolchain/mfc/run/input.py — patch fallthrough leaves thermochem_code unconditionally reassigned

When the no-match + new_macro-already-present branch is taken, thermochem_code = patched is a harmless no-op (both are equal). However, the control flow conflates two distinct outcomes (patch applied vs. patch not needed) into a single silent pass. Adding an explicit else for the apply-succeeded path and moving the assignment inside it would make intent clearer and would guard against future refactors introducing subtle bugs.


Minor observations (non-blocking)

  • src/common/m_phase_change.fpp:104-113 — The caller-side !DIR$ NOINLINE guard block uses raw #ifdef _CRAYFTN / #ifdef MFC_OpenACC inside a Fortran comment literal rather than Fypp. This is consistent with existing patterns in the codebase and compiles correctly, but is the only place in the diff where #ifdef MFC_OpenACC (C preprocessor style) and Fortran !DIR$ directives are interleaved directly without a Fypp macro wrapper.
  • CMakeLists.txt:+648 — Blank line added before the block comment (cosmetic).
  • The six tests confirmed passing on Frontier CCE 19.0.0 + OpenACC cover the affected code paths well; the performance table is a good addition to the PR description.

…only

1,500+ stress-test rounds on CCE 19.0.0 showed zero ICEs with plain
dimension(num_species) local arrays in m_chemistry.fpp. Remove all
#:if USING_CCE fixed-size array guards, the CCE_MAX_SPECIES Fypp
constant, the @:PROHIBIT runtime checks, and the matching Python-side
species-count validation in input.py.

Simplify the compound AMD guard from
  (not MFC_CASE_OPTIMIZATION and USING_AMD) or USING_CCE
to
  not MFC_CASE_OPTIMIZATION and USING_AMD
with a literal dimension(10) — the AMD workaround is preserved.

The -Oipa0 per-file CMake flags for m_bubbles_EL and m_phase_change
are kept; those ICEs are confirmed required by 20/20 positive-control
rounds and GitHub CI history.

Temporarily remove non-CCE jobs from CI (GitHub runners, Phoenix,
Frontier-AMD) to focus test bandwidth on the CCE fix branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Claude Code Review

Head SHA: 878fddb

Files changed: 15

  • .github/scripts/run_monitored_slurm_job.sh (new)
  • .github/workflows/bench.yml, frontier/build.sh, frontier/submit.sh, phoenix/bench.sh, phoenix/submit.sh, phoenix/test.sh, test.yml
  • .gitignore, CMakeLists.txt
  • src/common/include/parallel_macros.fpp
  • src/common/m_phase_change.fpp, src/common/m_variables_conversion.fpp
  • src/simulation/m_bubbles_EL.fpp
  • toolchain/mfc/run/input.py

Summary

  • Adds cray_noinline parameter to GPU_ROUTINE and fixes latent cray_inline bug on Cray+OpenACC (was emitting !DIR$ INLINEALWAYS without !$acc routine seq).
  • Applies per-file -Oipa0 for m_bubbles_EL and m_phase_change (Cray, not OpenMP).
  • Fixes FT uninitialized use and replaces matmul() with explicit 2×2 scalar arithmetic in m_phase_change.fpp.
  • Changes proc_bubble_counts from VLA to allocatable in m_bubbles_EL.fpp.
  • Patches pyrometheus-generated m_thermochem.f90 to emit !$acc routine seq on Cray+OpenACC.
  • Narrows CI matrix to Frontier CCE only; removes GitHub-hosted runners and Phoenix/AMD entries.

Findings

1. Bug 6 / m_chemistry.fpp absent from diff — unimplemented fix (potential gap)

The PR description lists src/common/m_chemistry.fpp under Files changed and describes Bug 6:

"Guard all 4 VLA locations with #:if USING_CCE to use dimension(10) instead."

However, src/common/m_chemistry.fpp is not present in the actual diff. Bug 6 (VLA dimension(num_species) ICE in case-optimized pre_process builds) appears to be unimplemented. If CCE 19.0.0 can still trigger the ICE on pre_process case-optimized builds, this is a latent failure mode. The description should be corrected or the fix should be added.

2. github CI job removed — breaks gfortran + Intel ifx coverage

test.yml removes the entire github job (Ubuntu + macOS, gfortran + Intel ifx). Per CLAUDE.md, gfortran and Intel ifx are CI-gated compilers ("must compile with gfortran, nvfortran, Cray ftn, and Intel ifx (CI-gated)"). The PR body says "All GitHub CI (ubuntu + macOS) passing" but then removes the job. The stated rationale covers Phoenix and Frontier AMD infrastructure failures — not GitHub-hosted runners. This needs clarification before merge. Dropping these CI targets, even temporarily, removes the gfortran and Intel ifx gates.

3. m_bubbles_EL.fpppart_order and part_ord_mpi remain as VLAs (src/simulation/m_bubbles_EL.fpp, near line 1532)

-        integer, dimension(num_procs) :: part_order, part_ord_mpi
-        integer, dimension(num_procs) :: proc_bubble_counts
+        integer, allocatable :: proc_bubble_counts(:)

part_order and part_ord_mpi remain as dimension(num_procs) VLAs in the same subroutine as the fixed proc_bubble_counts. The per-file -Oipa0 likely suppresses the CCE ICE for these too, but the root-cause VLA pattern persists. If -Oipa0 is ever lifted, these could re-trigger the crash. Worth converting for robustness (low priority given the flag).

4. run_monitored_slurm_job.sh — hardcoded sleep 30 epilog wait (line 29)

sleep 30
final_state=$(sacct -j "$job_id" -n -X -P -o State ...)

A 30-second fixed sleep for SLURM epilog finalization may be too short on heavily loaded systems or too long otherwise. Consider polling sacct in a short retry loop (e.g., 5×10 s) rather than a single sleep, similar to the pattern in monitor_slurm_job.sh. Low priority since this path is only reached on monitor failure.

5. parallel_macros.fppcray_inline path now identical to non-Cray on Cray+OpenACC (src/common/include/parallel_macros.fpp)

On Cray+OpenACC, cray_inline=True now emits !$acc routine seq (same as the #elif MFC_OpenACC non-Cray path). The !DIR$ INLINEALWAYS hint is only used on Cray CPU. This is the correct fix per the OpenACC spec, but means 33 routines across 8 files lose the inline hint on Cray GPU. Performance benchmarks show no regression, which is consistent with the explanation that CCE's -Oipa0-exempt routines can still be inlined via other mechanisms. No action required — just documenting the behavioral change.


No other correctness issues found

  • The 2×2 matmul → scalar expansion is mathematically equivalent.
  • FT = huge(1.0_wp) correctly initializes before the do while condition.
  • The pyrometheus patch logic includes a guard for the case where pyrometheus is already fixed (if new_macro in thermochem_code: pass), which is good defensive practice.
  • The @:ALLOCATE/@:DEALLOCATE macro convention does not apply to proc_bubble_counts since it is only used in MPI I/O (CPU-only), not GPU kernels.
  • CMakeLists.txt condition NOT MFC_OpenMP correctly applies -Oipa0 to Cray+OpenACC and Cray CPU while excluding Cray+OpenMP (where !DIR$ INLINEALWAYS-based IPA is still needed for thermochem).

Re-add the jobs that were temporarily removed for CCE testing:
- test.yml: github job (gfortran + Intel ifx), Phoenix matrix entries,
  Frontier AMD matrix entries
- bench.yml: Phoenix NVHPC entries, Frontier AMD entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Claude Code Review

Head SHA: 86f864d7a1421d49417b16b7712d4ef3baa758fb

Files changed: 14

  • .github/scripts/run_monitored_slurm_job.sh (new)
  • .github/workflows/frontier/build.sh
  • .github/workflows/frontier/submit.sh
  • .github/workflows/phoenix/bench.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/common/include/parallel_macros.fpp
  • src/common/m_phase_change.fpp
  • src/common/m_variables_conversion.fpp
  • src/simulation/m_bubbles_EL.fpp
  • toolchain/mfc/run/input.py

Summary

  • Adds targeted workarounds for six distinct CCE 19.0.0 compiler bugs in Cray+OpenACC GPU builds, including IPA crashes, InstCombine ICEs, and incorrect GPU routine registration.
  • Fixes a latent correctness bug in GPU_ROUTINE where cray_inline=True on Cray+OpenACC was emitting !DIR$ INLINEALWAYS without ! routine seq, meaning 33+ routines across 8 files were silently not registered as OpenACC device routines.
  • Introduces cray_noinline=True as a new surgical macro parameter to suppress IPA on specific hot-path routines without disabling it globally.
  • Adds per-file -Oipa0 in CMakeLists.txt for m_bubbles_EL and m_phase_change on Cray+OpenACC (not OpenMP), with clear rationale documented.
  • Patches the pyrometheus-generated m_thermochem.f90 at build time to correct the broken Cray+ACC GPU_ROUTINE macro.

Findings

1. test.ymlprintenv >> $GITHUB_ENV can corrupt environment parsing (line ~141)

The PR replaces the careful diff-based approach with a plain printenv >> $GITHUB_ENV:

source /opt/intel/oneapi/setvars.sh
printenv >> $GITHUB_ENV

The PR description acknowledges this is a known risk: shell internals with special characters (newlines in values, embedded = signs, etc.) can corrupt GITHUB_ENV parsing. The original diff-based approach was intentional; reverting to plain printenv is a regression in robustness, even if it works in practice for the Intel oneAPI variable set. Consider keeping the diff-based approach or using a safer filter (e.g. grep -v for known-problematic variables).

2. test.yml — GitHub-hosted CI (gfortran + Intel ifx) temporarily removed

The PR body explicitly notes that the github job was removed from test.yml to focus CI bandwidth and will be restored before merge. This must be tracked: merging without re-enabling these gates would bypass the gfortran and Intel ifx compiler checks that are CI-gated per CLAUDE.md.

3. parallel_macros.fppcray_noinline on Cray+OpenMP path emits omp_directive (correct, but verify intent)

In the cray_noinline branch (parallel_macros.fpp, added lines):

#elif MFC_OpenMP
        $:omp_directive
#else
        $:cray_noinline_directive   ! emits !DIR$ NOINLINE

On Cray CPU-only builds (_CRAYFTN defined, no MFC_OpenACC, no MFC_OpenMP), the #else branch emits !DIR$ NOINLINE. On non-Cray CPU builds, nothing is emitted (intentional per the comment). This is correct per the PR description; just confirming the logic is sound.

4. m_bubbles_EL.fpppart_order / part_ord_mpi VLAs remain (line ~1532)

The change converts proc_bubble_counts from VLA to allocatable but leaves part_order and part_ord_mpi as integer, dimension(num_procs) VLAs. Per the PR description, Bug 4 was addressed for proc_bubble_counts; if part_order / part_ord_mpi trigger similar IPA issues under future CCE versions, they would need the same treatment. Not a blocker, but worth noting.

5. input.py pyrometheus patch — fragile string match

old_macro = (
    "#ifdef _CRAYFTN\n#define GPU_ROUTINE(name) !DIR$ INLINEALWAYS name\n"
    "#else\n#define GPU_ROUTINE(name) ! routine seq\n#endif"
)

The fallback logic is good (raises MFCException if the expected pattern is absent and the correct form is also absent). However, the match is sensitive to exact whitespace in pyrometheus output; if pyrometheus ever adds a trailing space or changes line endings, the patch silently no-ops (since patched == thermochem_code and new_macro not in thermochem_code would still raise). The error path is well-handled. Minor improvement: log a warning when the patch is successfully applied so it is visible in build output if/when pyrometheus is updated upstream.

6. CMakeLists.txt — blank line added before comment block (line ~400)

A spurious blank line was added before the MFC_SETUP_TARGET comment. Minor cosmetic noise in the diff; not a functional issue.


Overall Assessment

The changes are well-reasoned, carefully scoped, and contain clear explanatory comments for every compiler workaround. The GPU_ROUTINE correctness fix (Bug 7) is the most impactful change and was latent for a long time — good catch. The main pre-merge requirement is restoring the GitHub-hosted gfortran + Intel ifx CI jobs that were temporarily removed.

When the Cray+ACC GPU_ROUTINE macro patch successfully replaces the
broken _CRAYFTN/#ifdef block in pyrometheus-generated m_thermochem.f90,
emit a yellow warning so engineers can tell the workaround is active
and will notice when pyrometheus eventually fixes the upstream issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Claude Code Review

Head SHA: c17653f
Files changed: 14

Changed files
  • .github/scripts/run_monitored_slurm_job.sh (new)
  • .github/workflows/frontier/build.sh
  • .github/workflows/frontier/submit.sh
  • .github/workflows/phoenix/bench.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/common/include/parallel_macros.fpp
  • src/common/m_phase_change.fpp
  • src/common/m_variables_conversion.fpp
  • src/simulation/m_bubbles_EL.fpp
  • toolchain/mfc/run/input.py

Summary

  • Works around six distinct CCE 19.0.0 compiler bugs in Cray+OpenACC GPU builds without touching numerical algorithms or the GPU execution model.
  • Fixes a latent correctness bug in GPU_ROUTINE where cray_inline=True on Cray+OpenACC did not emit !$acc routine seq, silently relying on IPA inlining to compensate.
  • Adds a new cray_noinline knob to GPU_ROUTINE and applies per-file -Oipa0 in CMake for the two problem files.
  • Patches pyrometheus-generated thermochem code at run-time in input.py to emit correct Cray+OpenACC device routine directives.
  • Improves SLURM job monitoring with sacct-based recovery when the runner is killed mid-job.

Findings

[CONCERN — test.yml:135] Revert of intentional GITHUB_ENV safety workaround

The old code deliberately avoided printenv >> $GITHUB_ENV after sourcing Intel oneAPI's setvars.sh, with an explicit comment explaining why:

# `printenv >> $GITHUB_ENV` dumps all vars including shell internals
# with special characters that corrupt GITHUB_ENV parsing.

The new code reverts to the simpler:

source /opt/intel/oneapi/setvars.sh
printenv >> $GITHUB_ENV

This is the pattern the old code intentionally worked around (github/actions/runner#1189). If Intel ifx CI on ubuntu is re-enabled before merge (as planned), this may trigger GITHUB_ENV parse failures. Recommend either keeping the diff-based filter or verifying this is no longer a problem with the current oneAPI version.

[MINOR — src/common/include/parallel_macros.fpp:51-70] cray_noinline naming vs. Cray+OpenACC behavior

On Cray+OpenACC, cray_noinline=True emits !$acc routine seq (not !DIR$ NOINLINE). The actual NOINLINE hint for Cray+OpenACC is placed as a caller-side directive in m_phase_change.fpp. The macro name therefore describes Cray-CPU behavior, not Cray+OpenACC behavior. This is documented in the PR body but may confuse future maintainers encountering these call sites. A brief inline comment in parallel_macros.fpp near the cray_noinline branch explaining this asymmetry would help.

[MINOR — src/simulation/m_bubbles_EL.fpp:1535] part_order/part_ord_mpi remain as VLAs

integer, dimension(num_procs) :: part_order, part_ord_mpi   ! still VLA
integer, allocatable :: proc_bubble_counts(:)               ! converted

Only proc_bubble_counts was changed to allocatable; part_order and part_ord_mpi are still VLAs (dimension(num_procs)). The per-file -Oipa0 handles this for now, but if IPA is re-enabled later these two may resurface. A comment noting they are intentionally left as VLAs (or converting them for consistency) would reduce future confusion.

[INFO — toolchain/mfc/run/input.py:110-120] Silent no-op path in thermochem patch

if patched == thermochem_code:
    if new_macro in thermochem_code:
        pass  # pyrometheus already emits the correct form; no patch needed
    else:
        raise common.MFCException(...)

The pass branch produces no log output. If pyrometheus upstream fixes the issue and begins emitting the correct !$acc routine seq directly, this PR's patch will silently become a no-op. The eventual removal of the workaround could be missed. Adding a low-priority console note (or at minimum a # TODO: remove when pyrometheus ≥ X.Y comment) would make the dead-code path easier to notice.


Overall the Fortran-side fixes are technically sound and well-reasoned. The CMake per-file -Oipa0 scoping is conservative and correct. The GITHUB_ENV item above is the only change worth revisiting before enabling the Intel/gfortran CI gate.

…rruption

Plain `printenv >> $GITHUB_ENV` after sourcing setvars.sh can corrupt
GITHUB_ENV parsing due to shell internals with special characters.
Restore the diff-based filter from 2c3590c that was accidentally
clobbered by 878fddb when slimming CI for CCE testing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Claude Code Review

Head SHA: 24ea0cb
Files changed: 14 — CMakeLists.txt, src/common/include/parallel_macros.fpp, src/common/m_phase_change.fpp, src/common/m_variables_conversion.fpp, src/simulation/m_bubbles_EL.fpp, toolchain/mfc/run/input.py, .github/workflows/test.yml, .github/scripts/run_monitored_slurm_job.sh, .github/workflows/frontier/{build,submit}.sh, .github/workflows/phoenix/{bench,submit,test}.sh, .gitignore

Summary

  • Adds a cray_noinline knob to GPU_ROUTINE and fixes the pre-existing cray_inline correctness bug on Cray+OpenACC (Bug 7: routines were not registered as OpenACC device routines).
  • Applies per-file -Oipa0 in CMakeLists.txt for m_bubbles_EL and m_phase_change under Cray+non-OpenMP builds to work around CCE 19.0.0 IPA crashes.
  • Fixes two correctness issues in m_phase_change.fpp: matmul() ICE workaround (explicit 2×2 scalar arithmetic) and uninitialized FT undefined behavior.
  • Converts proc_bubble_counts VLA to allocatable in m_bubbles_EL.fpp.
  • Post-processes pyrometheus-generated m_thermochem.f90 in input.py to correct missing !$acc routine seq on Cray+OpenACC.
  • Disables the github CI job (gfortran + Intel ifx) and removes Phoenix GPU bench entries from the bench matrix; updates Frontier SLURM account/partition.

Findings

[BLOCKER] test.yml: gfortran and Intel ifx CI gates are disabled
.github/workflows/test.yml — the entire github matrix job is removed from this branch. Per CLAUDE.md, gfortran and Intel ifx are CI-gated compilers that must always pass. The PR body acknowledges this ("will be restored before merge"), but as-is the PR cannot be merged without reinstating these jobs. If they are re-added before merge, this is a non-issue.

[MINOR] m_bubbles_EL.fpp:1532part_order and part_ord_mpi remain as VLAs

integer, dimension(num_procs) :: part_order, part_ord_mpi   ! still VLAs
integer, allocatable :: proc_bubble_counts(:)               ! fixed

Only proc_bubble_counts was converted to allocatable. part_order and part_ord_mpi are still dimension(num_procs) VLAs. With -Oipa0 applied to the whole file the IPA ICE is suppressed, so this is safe for CCE 19. However, if these VLAs are also passed into GPU loops elsewhere, they may cause similar issues under future compiler versions. Worth a comment confirming the intent.

[MINOR] input.py:103–125 — pyrometheus patch relies on exact string matching
The CCE workaround searches for a multi-line literal string in the pyrometheus output. A whitespace difference or upstream line-ending change would silently skip the patch (falling into the pass branch if new_macro in thermochem_code, or raising MFCException otherwise). The defensive MFCException path is good, but a more robust match (e.g., regex) would be safer long-term. Low priority since the error path is well-handled.

[MINOR] CMakeLists.txt:643-Oipa0 applied to Cray CPU builds too
The CMake condition is Cray AND NOT MFC_OpenMP, so Cray CPU builds also receive -Oipa0 for m_bubbles_EL and m_phase_change. The comment explains this is intentional for OpenACC (pyrometheus uses !$acc routine seq, no IPA needed) vs OpenMP (thermochem uses !DIR$ INLINEALWAYS, requires IPA). The comment is clear, but it is worth ensuring Cray CPU performance for these files is not regressed — the benchmark table in the PR body only covers OpenACC runs.


Positive Observations

  • Bug 7 fix is correct and important. Before this PR, cray_inline=True on Cray+OpenACC silently emitted only !DIR$ INLINEALWAYS with no !$acc routine seq, meaning 33 routines across 8 files were not registered as device routines. The fix in parallel_macros.fpp correctly emits !$acc routine seq for Cray+OpenACC and reserves INLINEALWAYS for Cray CPU. The mutual-exclusivity assert for cray_inline/cray_noinline is a good guard.
  • FT initialization fix (huge(1.0_wp)) correctly addresses undefined behavior — Fortran .or. is not required to short-circuit, so abs(FT) must be defined on entry to the do while.
  • run_monitored_slurm_job.sh is a clean resilience wrapper; the sacct re-check with a 30-second epilog delay is a reasonable pattern for HPC CI.
  • The @:ALLOCATE/@:DEALLOCATE pairing for proc_bubble_counts is correct.

- case-optimization: restore Phoenix (acc+omp) and Frontier AMD (omp)
  entries that were incorrectly narrowed to CCE-only
- self: restore Frontier AMD GPU OMP sharding (1/2, 2/2) that was
  unintentionally removed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Claude Code Review

Head SHA: f881888
Files changed: 14

Changed files
  • .github/scripts/run_monitored_slurm_job.sh (new)
  • .github/workflows/frontier/build.sh
  • .github/workflows/frontier/submit.sh
  • .github/workflows/phoenix/bench.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/common/include/parallel_macros.fpp
  • src/common/m_phase_change.fpp
  • src/common/m_variables_conversion.fpp
  • src/simulation/m_bubbles_EL.fpp
  • toolchain/mfc/run/input.py

Summary

  • Adds 7 targeted CCE 19.0.0 compiler-bug workarounds for Cray+OpenACC GPU builds without touching numerical algorithms.
  • Fixes a latent correctness bug in GPU_ROUTINE where cray_inline=True on Cray+OpenACC silently omitted ! routine seq, relying on IPA to inline device routines.
  • Replaces matmul() with explicit 2×2 scalar arithmetic to avoid an InstCombine ICE, and fixes an uninitialized-variable UB in s_TSat.
  • Changes proc_bubble_counts from a VLA to allocatable to avoid an IPA PHI-fold crash in m_bubbles_EL.
  • Patches pyrometheus-generated thermochem code at input-generation time to fix missing ! routine seq on Cray+OpenACC.

Findings

1. test.yml — empty matrix.precision expansion (Medium)

File: .github/workflows/test.yml, Build step
The original build step guarded against empty precision with:

PRECISION: ${{ matrix.precision != '' && format('--{0}', matrix.precision) || '' }}

and then used $PRECISION in the command. The new code inlines it directly:

/bin/bash mfc.sh test -v --dry-run -j $(nproc) --${{ matrix.debug }} --${{ matrix.mpi }} --${{ matrix.precision }} $TEST_ALL

If any matrix entry has precision: '' (empty string), this expands to a bare -- argument which is typically interpreted as end-of-options by shell parsers and would be passed to mfc.sh. This is a regression from the prior guarded form. Verify that every active matrix row has a non-empty precision value, or restore the conditional guard.


2. parallel_macros.fppcray_noinline emits nothing for non-Cray CPU builds (Low, intentional but underdocumented)

File: src/common/include/parallel_macros.fpp, new cray_noinline branch
When cray_noinline=True is used on a non-Cray CPU build (no _CRAYFTN, no MFC_OpenACC, no MFC_OpenMP), the macro emits nothing — no !acc routine seq, no noinline hint. The inline comment explains this is intentional (!DIR$ NOINLINE is Cray-specific). This is correct, but callers in m_phase_change.fpp and m_variables_conversion.fpp were previously using cray_inline=True (which also emits ! routine seq on non-Cray GPU builds via the #elif MFC_OpenACC / #elif MFC_OpenMP fallbacks). Those same fallbacks are present in the cray_noinline branch, so behavior on non-Cray GPU builds is preserved. No action required; the logic is sound.


3. m_bubbles_EL.fpppart_order / part_ord_mpi remain as VLAs (Low)

File: src/simulation/m_bubbles_EL.fpp, ~line 1532
proc_bubble_counts is converted to allocatable to fix the IPA PHI crash, but part_order and part_ord_mpi remain as integer, dimension(num_procs) VLAs in the same subroutine. Given the extensive stress-testing described in the PR (-Oipa0 applied per-file), these surviving VLAs are presumably fine under -Oipa0. Flagging for awareness in case CCE 19 surfaces the same ICE for these on a future code path.


4. input.py — pyrometheus patch uses fragile string matching (Low, acceptable workaround)

File: toolchain/mfc/run/input.py, lines ~112–130
The old_macro pattern hard-codes exact whitespace/newlines from pyrometheus output. If the upstream generator changes formatting (even trailing whitespace), the replace will silently no-op. The fallback raise MFCException does catch the case where neither old nor new form is present, making this safe in practice. The TODO comment is present. No change required; the guard is adequate for a deliberately temporary workaround.


5. frontier/build.sh + phoenix/test.sh / bench.sh — build cache replaced with rm -rf build (Note)

Files: .github/workflows/frontier/build.sh:23, .github/workflows/phoenix/test.sh:6, .github/workflows/phoenix/bench.sh:17
These now unconditionally nuke the build directory. Eliminating the cache avoids stale-artifact problems (possibly the root of recent CI instability) but increases build time on every run. This appears intentional; no action required.


Overall

The fixes are well-motivated, narrowly scoped, and extensively tested on CCE 19.0.0. The most actionable item is finding #1 (possible bare -- flag in test.yml when matrix.precision is empty). Everything else is low-severity or informational.

- self/case-optimization: clean: true on checkout
- self: build retry max_attempts 3→2, on_retry rm -rf build
- self/case-optimization: add Cancel SLURM Jobs step on cancellation
- case-optimization: drop retry wrapper on Pre-Build (login node)
- CMakeLists.txt: skip -march=native for gcov builds; add
  -mno-avx512fp16 to avoid binutils <2.38 assembler failures on
  Granite Rapids CPUs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Claude Code Review

Head SHA: fd06e97b5c1b076b6ceb79b919ab73b9e2af999e
Files changed: 14

Changed files
  • .github/scripts/run_monitored_slurm_job.sh (new)
  • .github/workflows/frontier/build.sh
  • .github/workflows/frontier/submit.sh
  • .github/workflows/phoenix/bench.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/common/include/parallel_macros.fpp
  • src/common/m_phase_change.fpp
  • src/common/m_variables_conversion.fpp
  • src/simulation/m_bubbles_EL.fpp
  • toolchain/mfc/run/input.py

Summary

  • Well-described workarounds for 7 distinct CCE 19.0.0 compiler bugs; each fix is targeted and minimally invasive.
  • GPU_ROUTINE macro correctness fix (cray_inline=True now emits ! routine seq on Cray+OpenACC) is a genuine latent correctness bug fix.
  • FT = huge(1.0_wp) initialization and matmul() → explicit scalar arithmetic fixes are clean and correct.
  • Pyrometheus post-processing patch is pragmatic with appropriate error detection.
  • CI monitoring hardening (SLURM recovery wrapper, job cancellation on workflow cancel) is a good operational improvement.

Findings

1. m_bubbles_EL.fpp — Silent removal of part_order / part_ord_mpi declarations (medium)

File: src/simulation/m_bubbles_EL.fpp, diff hunk around line 1535

The diff removes two VLA declarations without any explanation in the PR body:

-    integer, dimension(num_procs) :: part_order, part_ord_mpi
-    integer, dimension(num_procs) :: proc_bubble_counts
+    integer, allocatable :: proc_bubble_counts(:)

Only proc_bubble_counts is discussed as the Bug 4 fix. The removal of part_order and part_ord_mpi is silent — these are also dimension(num_procs) VLAs that could trigger the same ICE. If they are dead code, that should be noted. If they are still referenced (but hidden in unchanged function body lines not visible in the diff), this is a compile error masked only by the fact that Frontier CI passed (which is the definitive check here). Please confirm these are unused dead variables, and if so add a brief comment in the PR.

2. test.ymlgithub CI job (gfortran + Intel ifx) removed (high)

File: .github/workflows/test.yml

The PR body states this is temporary and will be restored before merge. Per CLAUDE.md, MFC must compile with gfortran, nvfortran, Cray ftn, and Intel ifx — all CI-gated. With the github job absent, the two non-HPC compiler gates are not enforced during the PR lifecycle. Please restore this job (or at minimum gate the merge on it) before merging, not just before merge as a last step.

3. CMakeLists.txt-Oipa0 applied to Cray CPU builds as well (low)

File: CMakeLists.txt, line ~655:

if (CMAKE_Fortran_COMPILER_ID STREQUAL "Cray" AND NOT MFC_OpenMP)

The comment says this is intentional for both Cray+OpenACC and Cray CPU, but not Cray+OpenMP. However, applying -Oipa0 to plain Cray CPU builds (no GPU) may silently penalize performance for users doing CPU-only Cray builds. The thermochem INLINEALWAYS concern that blocks this on OpenMP applies equally to CPU builds — unless there is a reason these routines are not called in CPU-only chemistry runs. Worth documenting the reason CPU builds are included here, or restricting to MFC_OpenACC only if that is actually sufficient.

4. input.py — Silent fallthrough in pyrometheus patch (low)

File: toolchain/mfc/run/input.py, around line 111:

if new_macro in thermochem_code:
    pass  # pyrometheus already emits the correct form; no patch needed

This path is completely silent. If pyrometheus upstream fixes the macro and starts emitting the correct form, users will get no indication that the workaround is no longer needed. A cons.print here (matching the yellow warning style used in the patched branch) would help a future maintainer notice when the workaround can be safely removed.


Minor / Nit

  • .github/workflows/test.yml — The Set up Python 3.14 step is new and slightly unusual (3.14 is a pre-release as of early 2026). If this is intentional to test against the latest Python, it is fine, but if the intent was just to pin a stable version, 3.12 or 3.13 might be safer for reliability.
  • CMakeLists.txt — Extra blank line added at line 411 (minor formatting artifact).
  • The mutual-exclusivity assert in parallel_macros.fpp (not (cray_inline and cray_noinline)) is clean and correct — good addition.

Overall this is a careful, well-tested workaround PR. The main pre-merge action item is restoring the gfortran/ifx CI gate.

@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Claude Code Review

Head SHA: 917cdd55c8f437f7e494977a48fbca32d04ac37f
Files changed: 14 — CMakeLists.txt, parallel_macros.fpp, m_phase_change.fpp, m_variables_conversion.fpp, m_bubbles_EL.fpp, toolchain/mfc/run/input.py, .github/workflows/test.yml, .github/workflows/frontier/build.sh, .github/workflows/frontier/submit.sh, .github/workflows/phoenix/bench.sh, .github/workflows/phoenix/submit.sh, .github/workflows/phoenix/test.sh, .github/scripts/run_monitored_slurm_job.sh, .gitignore

Summary

  • Workarounds for 7 CCE 19.0.0 compiler bugs across Cray+OpenACC GPU builds, including ICEs in IPA, InstCombine, and missing OpenACC device registration
  • New cray_noinline knob in GPU_ROUTINE + correctness fix for broken cray_inline path on Cray+OpenACC (Bug 7 was silently masked by IPA inlining)
  • Per-file -Oipa0 in CMake for two simulation files; pyrometheus post-processing patch in input.py to inject correct \! routine seq on Cray+OpenACC
  • Genuine correctness fix: uninitialized FT in s_TSat and matmul() replaced with explicit 2×2 scalar arithmetic
  • Resilient SLURM job monitoring script + CI workflow improvements (clean checkouts, cancelled-job scancel, build-cache simplification)

Findings

1. Medium — test.yml (line ~153): --${{ matrix.precision }} passes bare -- when precision is empty

The previous code safely guarded this:

# Before:
env:
  PRECISION: ${{ matrix.precision \!= '' && format('--{0}', matrix.precision) || '' }}
# ...used as: 

# After:
/bin/bash mfc.sh test -v --dry-run -j $(nproc) --${{ matrix.debug }} --${{ matrix.mpi }} --${{ matrix.precision }} $TEST_ALL

If any matrix entry has precision: '' (or the field is absent/empty), the expanded command gets a bare -- flag which mfc.sh will not recognise. The same issue applies to the Test step's mfc.sh test invocation if matrix.precision is ever empty. Consider restoring a conditional, e.g.:

PRECISION: ${{ matrix.precision \!= '' && format('--{0}', matrix.precision) || '' }}

and using $PRECISION in the run command, or add a matrix-level default (precision: 'double') to guarantee non-empty values.

2. Low — src/simulation/m_bubbles_EL.fpp (line ~1535): part_order / part_ord_mpi remain as VLAs

-integer, dimension(num_procs) :: part_order, part_ord_mpi
-integer, dimension(num_procs) :: proc_bubble_counts
+integer, allocatable :: proc_bubble_counts(:)

proc_bubble_counts is correctly changed to allocatable to fix Bug 4, but part_order and part_ord_mpi retain the same dimension(num_procs) VLA pattern on the same line that was deleted. If the IPA ICE was exclusively triggered by proc_bubble_counts (e.g. because of its interaction with specific GPU loops), these two are safe. If not, they may be latent ICE triggers under future CCE versions. Worth a comment or future-proofing note at minimum.

3. Note — test.yml: GitHub-hosted CI (gfortran + Intel ifx) temporarily disabled

The PR description explicitly calls this out as temporary and to be restored before merge. Flagging here as a merge-gate reminder: the four CI-gated compilers (gfortran, nvfortran, Cray ftn, Intel ifx) must all be green before this lands.


Minor observations

  • CMakeLists.txt (line ~648): -Oipa0 guard is Cray AND NOT MFC_OpenMP, which also applies to Cray CPU (non-GPU) builds for m_bubbles_EL and m_phase_change. The comment documents this correctly, but the performance impact on Cray CPU regression tests is worth monitoring — those files are in src/common/ / src/simulation/, and IPA is normally beneficial for thermochem inlining there.

  • input.py pyrometheus patch: The guard if patched == thermochem_code → check for new_macro in thermochem_code is a reasonable sentinel for detecting if upstream pyrometheus has been fixed. The MFCException on unexpected format change is good defensive programming.

  • run_monitored_slurm_job.sh: The sleep 30 after monitor failure is a pragmatic epilog-finalization wait. The sacct -P / --parsable2 parsing looks correct (cut -d'|' -f1 for State, bare tr -d ' ' for ExitCode which has no trailing delimiter with --parsable2).

  • parallel_macros.fpp: On non-Cray non-GPU builds, cray_noinline=True intentionally emits nothing — documented inline. On non-Cray OpenACC (NVIDIA nvfortran), the #elif MFC_OpenACC branch correctly falls through to $:acc_directive, preserving GPU registration. The mutual-exclusivity @:assert for cray_inline / cray_noinline is a good safeguard.

Overall the approach is well-motivated, the individual workarounds are surgical and clearly documented, and the test evidence is thorough. The precision flag regression in test.yml is the main concern to address before merge.

@sbryngelson sbryngelson merged commit f95b2c4 into MFlowCode:master Mar 9, 2026
51 of 55 checks passed
sbryngelson added a commit to sbryngelson/MFC that referenced this pull request Mar 11, 2026
Resolve conflicts with master PRs MFlowCode#1295 (CI robustness refactor) and
MFlowCode#1286 (CCE 19.0.0 workarounds). Keep coverage pruning features while
adopting master's submit-job.sh split, build cache, Frontier
continue-on-error, and improved SLURM monitoring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants