Add gcov-based test pruning with coverage cache#1284
Add gcov-based test pruning with coverage cache#1284sbryngelson wants to merge 2 commits intoMFlowCode:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a gcov-derived, file-level coverage cache to enable pruning the MFC test suite to only those tests that exercise files changed in a PR, with CI support to rebuild and distribute the cache when it becomes stale.
Changes:
- Introduces
toolchain/mfc/test/coverage.py(cache build/load + changed-file detection + coverage-based test filtering) and commits a gzipped JSON cache. - Integrates new CLI flags
--only-changesand--build-coverage-cacheinto the test runner and CLI schema. - Updates CI workflows/scripts to rebuild the cache on dependency graph changes and to run pruned test suites on PRs.
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
toolchain/mfc/test/coverage.py |
Implements coverage cache build/load + changed-file diffing + pruning logic (ALWAYS_RUN_ALL safeguards). |
toolchain/mfc/test/test_coverage_unit.py |
Adds offline unit tests covering diff parsing, gcov JSON parsing, cache normalization, and pruning behavior. |
toolchain/mfc/test/test.py |
Wires --only-changes pruning and --build-coverage-cache cache generation into the test command. |
toolchain/mfc/cli/commands.py |
Adds CLI arguments/examples for coverage cache building and pruning. |
toolchain/mfc/test/case.py |
Factors post-process output parameter mods into reusable constants/helpers used by the cache builder. |
toolchain/mfc/test/test_coverage_cache.json.gz |
Adds committed baseline coverage cache artifact. |
.github/workflows/test.yml |
Adds rebuild-cache job + dependency-change detection + artifact flow; enables pruning on PRs. |
.github/workflows/phoenix/rebuild-cache.sh |
SLURM-executed rebuild script: clean, gcov build, build cache in parallel. |
.github/workflows/phoenix/test.sh |
Enables pruning on PRs and scales CPU thread count based on allocation. |
.github/workflows/phoenix/submit.sh |
Adjusts CPU SLURM submission options for cache rebuild workload. |
.github/workflows/frontier/test.sh |
Enables pruning on PRs. |
.github/workflows/frontier_amd/test.sh |
Enables pruning on PRs. |
.github/file-filter.yml |
Adds cases_py filter to trigger cache rebuilds when test definitions change. |
CMakeLists.txt |
Improves gcov build reliability (disable -march=native/LTO for gcov; add Fypp line-marker flag). |
.gitignore |
Ignores legacy raw (non-gz) cache file. |
src/simulation/m_bubbles.fpp |
TEMP change: duplicated use for pipeline exercise. |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
src/simulation/m_bubbles.fpp (1)
19-19: Remove duplicate module import.Line 19 repeats
use m_helper_basicalready declared on Line 18. This is redundant and can cause avoidable compiler warnings/noise across toolchains.Proposed fix
- use m_helper_basic.github/workflows/phoenix/test.sh (1)
54-56: Potential issue with unset variable in arithmetic comparison.When
SLURM_CPUS_ON_NODEis unset, the comparisonSLURM_CPUS_ON_NODE > 64evaluates the unset variable as 0, but the default:-8is only applied in the else branch. This works but is fragile. Consider applying the default first:Suggested improvement
-# Use up to 64 parallel test threads on CPU (GNR nodes have 192 cores). -# Cap at 64 to avoid overwhelming MPI's ORTE daemons with concurrent launches. -n_test_threads=$(( SLURM_CPUS_ON_NODE > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} )) +# Use up to 64 parallel test threads on CPU (GNR nodes have 192 cores). +# Cap at 64 to avoid overwhelming MPI's ORTE daemons with concurrent launches. +_cpus=${SLURM_CPUS_ON_NODE:-8} +n_test_threads=$(( _cpus > 64 ? 64 : _cpus ))toolchain/mfc/test/test_coverage_unit.py (1)
74-75: Use underscore-prefixed stub args to keep lint clean.Optional cleanup to silence ARG005 without changing behavior.
♻️ Suggested tweak
-_case_stub.input_bubbles_lagrange = lambda case: None -_case_stub.get_post_process_mods = lambda params: {} +_case_stub.input_bubbles_lagrange = lambda _case: None +_case_stub.get_post_process_mods = lambda _params: {}
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
toolchain/mfc/test/test_coverage_cache.json.gzis excluded by!**/*.gz
📒 Files selected for processing (15)
.github/file-filter.yml.github/workflows/frontier/test.sh.github/workflows/frontier_amd/test.sh.github/workflows/phoenix/rebuild-cache.sh.github/workflows/phoenix/submit.sh.github/workflows/phoenix/test.sh.github/workflows/test.yml.gitignoreCMakeLists.txtsrc/simulation/m_bubbles.fpptoolchain/mfc/cli/commands.pytoolchain/mfc/test/case.pytoolchain/mfc/test/coverage.pytoolchain/mfc/test/test.pytoolchain/mfc/test/test_coverage_unit.py
fcfa309 to
d6e922c
Compare
Claude Code ReviewHead SHA: Files changed: 26 — key paths:
Summary
Findings🔴 Critical1. TEMP changes left in the PR — must be reverted before merge The PR description states:
🟠 Issues2. Direct push from CI to git push origin HEAD:refs/heads/masterThe 3. grep -qP '^[+-]\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])'The pattern 4. set(CMAKE_Fortran_FLAGS_RELEASE "-O1 -DNDEBUG" CACHE STRING "" FORCE)Using 5. New The unit test # Brand new .fpp file has no coverage in cache -> 0 tests run
assert len(to_run) == 0If a developer adds a new 🟡 Minor / Nits6. needs.rebuild-cache.result != 'cancelled' &&When 7. ${FYPP_GCOV_OPTS}
8. The test job YAML fetches Overall this is a well-engineered feature with good test coverage (53 unit tests, comprehensive edge cases). The conservative fallbacks (missing cache → full suite, missing sim coverage → include) are appropriate for a safety-critical CFD code. The main blockers before merge are: resolving the explicit TEMP items (#1), and reviewing the direct push to master (#2). |
📝 WalkthroughWalkthroughThis pull request introduces file-level coverage-based test filtering for pull requests, complemented by CI/CD workflow optimizations and build system changes. A new coverage module analyzes which source files each test exercises, enabling selective test execution on PR changes. Benchmark case configurations reduce default simulation step counts. GitHub workflow files are updated with parallelism caps, partition adjustments, pruning flags, and coverage cache management. CMakeLists.txt adds gcov instrumentation support with line-marker mapping. New CLI test command options enable coverage cache building and change-based filtering. The test runner integrates coverage loading and filtering logic to skip unaffected tests during pull requests while maintaining full test runs for other events. 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
toolchain/mfc/test/cases.py (1)
1074-1074: Consider adding a comment explaining why1D_qbmmis skipped.The rationale for skipping this example isn't immediately clear. If there's a specific issue (e.g., coverage-related failures, excessive runtime, or known instability), a brief inline comment would help future maintainers understand the exclusion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cc9039fb-c9a7-4cbb-b465-9d02f54a165b
⛔ Files ignored due to path filters (1)
toolchain/mfc/test/test_coverage_cache.json.gzis excluded by!**/*.gz
📒 Files selected for processing (25)
.github/file-filter.yml.github/workflows/bench.yml.github/workflows/frontier/bench.sh.github/workflows/frontier/submit.sh.github/workflows/frontier/test.sh.github/workflows/lint-toolchain.yml.github/workflows/phoenix/bench.sh.github/workflows/phoenix/rebuild-cache.sh.github/workflows/phoenix/submit.sh.github/workflows/phoenix/test.sh.github/workflows/test.yml.gitignoreCMakeLists.txtbenchmarks/5eq_rk3_weno3_hllc/case.pybenchmarks/hypo_hll/case.pybenchmarks/ibm/case.pybenchmarks/igr/case.pybenchmarks/viscous_weno5_sgb_acoustic/case.pytoolchain/mfc/build.pytoolchain/mfc/cli/commands.pytoolchain/mfc/test/case.pytoolchain/mfc/test/cases.pytoolchain/mfc/test/coverage.pytoolchain/mfc/test/test.pytoolchain/mfc/test/test_coverage_unit.py
💤 Files with no reviewable changes (1)
- .github/workflows/lint-toolchain.yml
toolchain/mfc/test/coverage.py
Outdated
| for candidate in candidates: | ||
| path = shutil.which(candidate) | ||
| if path is None: | ||
| continue | ||
| try: | ||
| result = subprocess.run( | ||
| [path, "--version"], | ||
| capture_output=True, text=True, timeout=10, check=False | ||
| ) | ||
| version_out = result.stdout | ||
| if "Apple LLVM" in version_out or "Apple clang" in version_out: | ||
| continue # Apple's gcov cannot parse GCC-generated .gcda files | ||
| if "GCC" in version_out or "GNU" in version_out: | ||
| return path | ||
| except Exception: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the coverage.py file
cd toolchain && find . -name "coverage.py" -type fRepository: MFlowCode/MFC
Length of output: 110
🏁 Script executed:
# Read the full function to understand context
head -c 50000 toolchain/mfc/test/coverage.py | tail -c 50000Repository: MFlowCode/MFC
Length of output: 28642
🏁 Script executed:
# Get line count to understand file size
wc -l toolchain/mfc/test/coverage.pyRepository: MFlowCode/MFC
Length of output: 91
🏁 Script executed:
# Read the function around lines 109-123
sed -n '65,135p' toolchain/mfc/test/coverage.pyRepository: MFlowCode/MFC
Length of output: 2356
🏁 Script executed:
# Search for usage of find_gcov_binary to see if there are downstream assumptions
rg "find_gcov_binary" toolchain/Repository: MFlowCode/MFC
Length of output: 183
🏁 Script executed:
# Check if there are any comments or git history explaining the "GCC"/"GNU" check intent
git -C . log --all -S '"GCC" in version_out or "GNU" in version_out' --oneline --max-count=10Repository: MFlowCode/MFC
Length of output: 143
🏁 Script executed:
# Check for any version-specific feature usage of gcov in the code
rg "gcov_bin|gcov --" toolchain/mfc/test/coverage.py | head -30Repository: MFlowCode/MFC
Length of output: 555
🏁 Script executed:
# Look for gcov feature usage (e.g., --json-format which was added in gcov 13)
rg "\-\-json-format" toolchain/Repository: MFlowCode/MFC
Length of output: 151
🏁 Script executed:
# Search for gcov version outputs and documentation
web_search "gcov --version output GCC GNU format different distributions"Repository: MFlowCode/MFC
Length of output: 107
🏁 Script executed:
# Check if --json-format requires a specific gcov version
web_search "gcov --json-format minimum version requirement gcov 13"Repository: MFlowCode/MFC
Length of output: 107
🏁 Script executed:
# Look for any error handling or fallback for old gcov versions
rg "json-format|gcov.*version" toolchain/mfc/test/coverage.pyRepository: MFlowCode/MFC
Length of output: 350
🏁 Script executed:
# Check if the code handles gcov version validation anywhere
rg "gcov.*13|json.?format|version" toolchain/mfc/test/coverage.py -A 2 -B 2Repository: MFlowCode/MFC
Length of output: 2127
🏁 Script executed:
# Check if there are any tests or documentation about gcov version requirements
fd -type f -name "*.md" | xargs grep -l "gcov\|coverage" 2>/dev/null | head -5Repository: MFlowCode/MFC
Length of output: 227
🏁 Script executed:
# Search for any version checks of gcov elsewhere in the codebase
rg "version|gcov" toolchain/mfc/build/ | grep -i "gcov\|version"Repository: MFlowCode/MFC
Length of output: 117
🏁 Script executed:
# Check the build system for gcov requirements
fd -type f \( -name "*.cmake" -o -name "*.py" \) | xargs grep -l "gcov\|coverage" 2>/dev/nullRepository: MFlowCode/MFC
Length of output: 229
🏁 Script executed:
# Look at the git diff to understand what changed in this PR
git diff HEAD~1..HEAD -- toolchain/mfc/test/coverage.py 2>/dev/null | head -100Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Check if there's any CI configuration that specifies gcov version requirements
fd -type f \( -name ".github" -o -name ".gitlab-ci" -o -name "*.yml" -o -name "*.yaml" \) | head -20Repository: MFlowCode/MFC
Length of output: 227
🏁 Script executed:
# Search for any documentation or README about gcov requirements
find . -name "README*" -o -name "CONTRIBUTING*" -o -name "BUILD*" | head -10Repository: MFlowCode/MFC
Length of output: 413
🏁 Script executed:
# Check CMake files for gcov configuration
find . -name "*.cmake" 2>/dev/null | xargs grep -l "gcov" 2>/dev/nullRepository: MFlowCode/MFC
Length of output: 41
🏁 Script executed:
# Look for any version-related checks in Python code that mention gcov
rg "major|minor|version" toolchain/mfc/test/coverage.py | head -20Repository: MFlowCode/MFC
Length of output: 825
🏁 Script executed:
# Check if the code actually validates gcov version before using --json-format
sed -n '280,320p' toolchain/mfc/test/coverage.pyRepository: MFlowCode/MFC
Length of output: 1686
🏁 Script executed:
# Check the _parse_gcov_json_output function to understand what it does
sed -n '150,220p' toolchain/mfc/test/coverage.pyRepository: MFlowCode/MFC
Length of output: 2626
🏁 Script executed:
# Check if there are real-world examples of distro gcov versions without GCC/GNU
python3 << 'EOF'
import subprocess
import re
# Check what a typical gcov --version output looks like when parsed
test_outputs = [
"gcov (GCC 11.2.0) 11.2.0", # Standard GCC format
"gcov (GCC) 9.3.0", # Some distro format
"gcov version 11.0 (GCC 11)", # Alternative format
"gcov 11.2", # Minimal format
]
for output in test_outputs:
has_gcc = "GCC" in output
has_gnu = "GNU" in output
accepted = has_gcc or has_gnu
print(f"Output: {output!r}")
print(f" Has GCC/GNU: {accepted}\n")
EOFRepository: MFlowCode/MFC
Length of output: 260
🏁 Script executed:
# Verify the code does NOT require specific gcov version features before using --json-format
grep -B 5 -A 5 "json-format" toolchain/mfc/test/coverage.pyRepository: MFlowCode/MFC
Length of output: 504
🏁 Script executed:
# Check if there's any version validation in the codebase
rg "check.*version|version.*check|json.?format.*version" toolchain/mfc/test/Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Check if lines 109-123 are actually changed in this PR
git diff HEAD -- toolchain/mfc/test/coverage.py | grep -A 5 -B 5 "GCC\|GNU" | head -50Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Get the actual diff of the coverage.py file
git diff HEAD -- toolchain/mfc/test/coverage.py | head -200Repository: MFlowCode/MFC
Length of output: 39
🏁 Script executed:
# Check git status to see what's been modified
git status toolchain/mfc/test/coverage.pyRepository: MFlowCode/MFC
Length of output: 122
🏁 Script executed:
# Get exact line numbers for the function
sed -n '67,133p' toolchain/mfc/test/coverage.py | cat -n | grep -E "^[ ]*[109]|^[ ]*11[0-9]|^[ ]*12[0-3]"Repository: MFlowCode/MFC
Length of output: 415
🏁 Script executed:
# Show the specific lines 109-123
sed -n '109,123p' toolchain/mfc/test/coverage.pyRepository: MFlowCode/MFC
Length of output: 658
🏁 Script executed:
# Check what the actual line numbers are in the function
awk 'NR>=67 && NR<=133 { print NR": "$0 }' toolchain/mfc/test/coverage.pyRepository: MFlowCode/MFC
Length of output: 2611
🏁 Script executed:
# Check if there's a PR or if we're looking at a specific commit
git log --oneline -1Repository: MFlowCode/MFC
Length of output: 124
🏁 Script executed:
# Get the diff to see what lines actually changed
git show --statRepository: MFlowCode/MFC
Length of output: 50370
🏁 Script executed:
# Get full diff of coverage.py to see what actually changed
git show -- toolchain/mfc/test/coverage.py | head -300Repository: MFlowCode/MFC
Length of output: 11061
🏁 Script executed:
# Verify gcov is used without version validation before calling --json-format
rg "json.format" toolchain/mfc/test/coverage.py -B 5 -A 5Repository: MFlowCode/MFC
Length of output: 504
🏁 Script executed:
# Check if there's any error handling for failed --json-format calls
sed -n '280,310p' toolchain/mfc/test/coverage.pyRepository: MFlowCode/MFC
Length of output: 1361
Remove strict gcov version string matching in find_gcov_binary().
The current logic only accepts gcov binaries whose --version output contains "GCC" or "GNU", but some legitimate distro-provided GNU gcov binaries lack these strings (e.g., gcov 11.2). This causes false negatives and raises a spurious "GNU gcov not found" error, breaking cache builds. Since the code already gracefully handles gcov failures downstream and rejects Apple LLVM/clang separately, the string check is unnecessarily strict.
🔧 Proposed fix
if "Apple LLVM" in version_out or "Apple clang" in version_out:
continue # Apple's gcov cannot parse GCC-generated .gcda files
- if "GCC" in version_out or "GNU" in version_out:
- return path
+ return path📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for candidate in candidates: | |
| path = shutil.which(candidate) | |
| if path is None: | |
| continue | |
| try: | |
| result = subprocess.run( | |
| [path, "--version"], | |
| capture_output=True, text=True, timeout=10, check=False | |
| ) | |
| version_out = result.stdout | |
| if "Apple LLVM" in version_out or "Apple clang" in version_out: | |
| continue # Apple's gcov cannot parse GCC-generated .gcda files | |
| if "GCC" in version_out or "GNU" in version_out: | |
| return path | |
| except Exception: | |
| for candidate in candidates: | |
| path = shutil.which(candidate) | |
| if path is None: | |
| continue | |
| try: | |
| result = subprocess.run( | |
| [path, "--version"], | |
| capture_output=True, text=True, timeout=10, check=False | |
| ) | |
| version_out = result.stdout | |
| if "Apple LLVM" in version_out or "Apple clang" in version_out: | |
| continue # Apple's gcov cannot parse GCC-generated .gcda files | |
| return path | |
| except Exception: |
🧰 Tools
🪛 Ruff (0.15.2)
[error] 114-114: subprocess call: check for execution of untrusted input
(S603)
[warning] 123-123: Do not catch blind exception: Exception
(BLE001)
| changed_fpp = {f for f in changed_files if f.endswith(".fpp")} | ||
| if not changed_fpp: | ||
| return [], list(cases) |
There was a problem hiding this comment.
Non-.fpp Fortran changes are currently treated as “no source changes,” so tests get skipped.
This branch skips all tests when no changed .fpp files are present. A PR that changes .f90/.f source can therefore be incorrectly treated as unaffected under --only-changes.
🔧 Proposed fix
- changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
- if not changed_fpp:
+ changed_fortran = {
+ f for f in changed_files if f.endswith((".fpp", ".f90", ".f"))
+ }
+ if not changed_fortran:
return [], list(cases)
+
+ # Cache is file-level .fpp today; non-.fpp Fortran edits cannot be pruned safely.
+ if any(f.endswith((".f90", ".f")) for f in changed_fortran):
+ return list(cases), []
- changed_sim = any(f.startswith("src/simulation/") for f in changed_fpp)
+ changed_sim = any(f.startswith("src/simulation/") for f in changed_fortran)
@@
- if test_file_set & changed_fpp:
+ if test_file_set & changed_fortran:
to_run.append(case)| # Example-based tests cover no unique files beyond non-example tests, | ||
| # so they add no value to coverage-based pruning. Skip them entirely. | ||
| example_skipped = [c for c in cases if "Example" in c.trace] | ||
| cases = [c for c in cases if "Example" not in c.trace] | ||
| skipped_cases += example_skipped | ||
| if example_skipped: | ||
| cons.print(f"[dim]Skipped {len(example_skipped)} example tests " | ||
| f"(redundant coverage)[/dim]") | ||
|
|
||
| cache = load_coverage_cache(common.MFC_ROOT_DIR) | ||
| if cache is None: | ||
| cons.print("[yellow]Coverage cache missing or stale.[/yellow]") | ||
| cons.print("[yellow]Run: ./mfc.sh build --gcov -j 8 && ./mfc.sh test --build-coverage-cache[/yellow]") | ||
| cons.print("[yellow]Falling back to full test suite.[/yellow]") | ||
| else: |
There was a problem hiding this comment.
“Fallback to full suite” path still excludes Example tests.
Examples are removed before cache/diff checks, so the cache-miss and run-all fallbacks do not actually run the full suite.
🔧 Proposed fix
- # Example-based tests cover no unique files beyond non-example tests,
- # so they add no value to coverage-based pruning. Skip them entirely.
- example_skipped = [c for c in cases if "Example" in c.trace]
- cases = [c for c in cases if "Example" not in c.trace]
- skipped_cases += example_skipped
- if example_skipped:
- cons.print(f"[dim]Skipped {len(example_skipped)} example tests "
- f"(redundant coverage)[/dim]")
+ example_cases = [c for c in cases if "Example" in c.trace]
+ non_example_cases = [c for c in cases if "Example" not in c.trace]
cache = load_coverage_cache(common.MFC_ROOT_DIR)
if cache is None:
cons.print("[yellow]Coverage cache missing or stale.[/yellow]")
cons.print("[yellow]Run: ./mfc.sh build --gcov -j 8 && ./mfc.sh test --build-coverage-cache[/yellow]")
cons.print("[yellow]Falling back to full test suite.[/yellow]")
@@
else:
+ # Exclude example tests only when pruning is actually applied.
+ cases = non_example_cases
+ skipped_cases += example_cases
+ if example_cases:
+ cons.print(f"[dim]Skipped {len(example_cases)} example tests "
+ f"(redundant coverage)[/dim]")
changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
if not changed_fpp:Also applies to: 135-143
b0e6e8a to
66c792a
Compare
Claude Code ReviewHead SHA: Changed files
Summary
Findings🔴 Security: Fork code runs with
|
Claude Code ReviewHead SHA: Changed filesSummary
Findings1. TEMP code must be reverted before merge (blocker)The PR description explicitly flags:
The test plan marks this as a required step ("Revert TEMP changes before merge"). The 2. Push-event dep-change detection silently fails on shallow clones
DIFF=$(git diff "$BEFORE".."$AFTER" 2>/dev/null || echo "")On the standard For PR events the fallback is safe ( DIFF=$(git diff "$BEFORE".."$AFTER" 2>/dev/null) || {
echo "git diff failed on push — defaulting to dep_changed=true for safety."
echo "dep_changed=true" >> "$GITHUB_OUTPUT"
exit 0
}3.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1284 +/- ##
==========================================
- Coverage 44.95% 44.84% -0.12%
==========================================
Files 70 70
Lines 20503 20511 +8
Branches 1946 1946
==========================================
- Hits 9217 9198 -19
- Misses 10164 10191 +27
Partials 1122 1122 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Claude Code ReviewHead SHA: df97a8a Files
Summary
Findings[Must fix before merge] TEMP changes need revert (
Neither the [Security / CI] permissions:
contents: write # Required for Commit Cache to Master on push events
steps:
- name: Clone
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}The job checks out If this repo ever receives PRs from external contributors (not just org members), this is a supply-chain risk. Consider scoping [Behavior change] Previously the [Minor] phase2_jobs = min(n_jobs, 16)The 16-worker limit is reasonable for OOM protection but is not configurable. On nodes with large RAM (Phoenix GNR nodes have significant memory), the cap may be unnecessarily conservative. A comment explaining the memory estimate (e.g. "each MPI+gcov test uses ~2–5 GB") is present and helpful — consider making it an environment variable or CLI parameter for future tuning without a code change. [Minor] Thread-safety note on console redirect in orig_file = cons.raw.file
cons.raw.file = io.StringIO()
try:
...
finally:
cons.raw.file = orig_fileThe comment says "NOTE: not thread-safe — Phase 1 must remain single-threaded." This is correct and documented. Just flagging it as a fragility: if Phase 1 is ever parallelized (e.g., to speed up the 555-case prepare loop), this will produce garbled/lost console output. A thread-local or lock-based approach would be more robust long-term. [Nit] Double space in YAML runner group ( runs-on:
group: phoenix # two spaces before "phoenix"Cosmetic, but inconsistent with the rest of the file. |
Claude Code ReviewHead SHA: Files changed: 28 Changed files
Summary
Findings1. TEMP changes must be reverted before merge (acknowledged in PR description) The PR description explicitly lists two temporary items needed only to exercise the cache rebuild pipeline in CI:
These must not be merged as-is. Consider using a separate test branch or a dummy file change that does not touch production Fortran source. 2. The PR fallback is correctly dep_changed=true when DIFF=$(git diff "$BEFORE".."$AFTER" 2>/dev/null || echo "")
DIFF=$(git diff "$BEFORE".."$AFTER" 2>/dev/null) || {
echo "dep_changed=true" >> "$GITHUB_OUTPUT"; exit 0
}3. ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}This runs code from the PR author's fork on the Phoenix self-hosted runner, which has 4.
5. Missing The conditions for needs.rebuild-cache.result != 'cancelled'This correctly handles the skipped case ( Improvement Opportunities
|
Claude Code ReviewHead SHA: Files (30)Summary
FindingsMust fix before merge1. TEMP changes are committed — must be reverted (acknowledged in PR checklist) The PR body explicitly marks two items as TEMP:
Both alter production behavior and must be reverted before merge. 2. Committed binary artifact built under TEMP conditions
Notable issues3. # current (inconsistent references):
n_test_threads=$(( SLURM_CPUS_ON_NODE > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} ))If n_test_threads=$(( ${SLURM_CPUS_ON_NODE:-8} > 64 ? 64 : ${SLURM_CPUS_ON_NODE:-8} ))4. retry_build ./mfc.sh build --gcov -j 8On an exclusive 192-core GNR node the build phase uses only 8 of 192 cores. Using 5. Global state mutation to suppress console output in orig_file = cons.raw.file
cons.raw.file = io.StringIO()
try:
...
finally:
cons.raw.file = orig_fileThis mutates a module-level singleton. Phase 1 is intentionally single-threaded so there is no race today, but redirecting 6. export GITHUB_EVENT_NAME="$GITHUB_EVENT_NAME"Exporting the variable so SLURM propagates it into the batch job is correct (SLURM strips most CI env vars). If the variable is unset on a manual submit, an empty string is exported and 7. Post-revert After the TEMP revert, Minor observations
|
Claude Code ReviewHead SHA: a84cabc Changed files
Summary
Findings🔴 Must fix before merge1. TEMP changes not reverted — test plan checkbox unchecked The PR body explicitly flags two TEMP changes that must be reverted before merge, and marks the step unchecked:
Both are still present:
🟡 Design concerns2. 3. Benchmark step counts reduced 3–3.5× without explanation 4. 🟢 Minor / informational5. 6. Before merging
|
Claude Code ReviewHead SHA: Files changed: 30 Summary
Findings1. TEMP changes must be reverted before merge (blocker)
Neither revert has been done. Do not merge until both are addressed. 2. 3. Direct push to master from CI with 4. 5. Phase 2 test parallelism cap at 16 may be fragile (, 6. No staleness check on the committed cache's gcov version (, 7. Minor / Non-blocking
|
Claude Code ReviewHead SHA: Changed files
Summary:
Findings[CRITICAL] Untrusted fork PR code executes on self-hosted runner with elevated permissionsFile: The ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}It then runs Recommended fix: Switch this job to use [HIGH] TEMP changes must be reverted before mergeFile: The PR body explicitly lists two TEMP changes:
In In The PR checklist says "Revert TEMP changes before merge" but they are not yet reverted. [HIGH] Benchmark timeout halved without justificationFile:
[MEDIUM]
|
1032dbc to
2210805
Compare
Claude Code ReviewHead SHA: Files changed: 17
Summary
Findings1. TEMP changes must be reverted before merge (critical, self-noted)The PR description explicitly flags these as temporary. They will make the cache stale or skip tests incorrectly if merged as-is:
2.
|
- File-level gcov coverage cache maps test UUIDs to exercised .fpp source files (gzip JSON, committed to repo) - --only-changes flag prunes tests by intersecting PR-changed files against coverage cache; conservative fallbacks for missing cache/coverage - --build-coverage-cache flag + 3-phase parallel cache builder (prepare, run, gcov collect) - New rebuild-cache CI job on Phoenix via SLURM when cases.py or Fortran dependency graph changes - Dep-change detection greps PR/push diffs for added use/include statements - 53 unit tests cover core coverage logic - Rebased onto PR MFlowCode#1299 unified CI architecture (submit-slurm-job.sh, common/test.sh) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2210805 to
c7a7725
Compare
Claude Code ReviewHead SHA: c7a7725 Changed files
Summary
Findings1. TEMP changes must be reverted before merge (blocker)
2. # Work on a copy so we don't permanently mutate the case object.
params = dict(case.params)
# ...
case.params = params # ← BUG: this mutates the case objectThe shallow copy on line 884 is made correctly, but line 915 reassigns 3. 4. Dep-change detection on push may be unreliable for force-pushes or initial branch pushes 5. Minor: double blank line in |
- Phase 3: omit failed tests from cache instead of storing [] (which permanently skips them); absent entries are conservatively included - _prepare_test: save/restore case.params in finally block so callers can safely reuse the case list after build_coverage_cache returns - _parse_gcov_json_output: return None on decode failure so caller omits the test from cache rather than storing empty coverage - Missing build dir: log warning and return None instead of [] - Lagrange bubble input failure: re-raise so test is skipped from cache - CI: replace always() with !cancelled() to respect workflow cancellation - CI: grep -P -> grep -E for POSIX portability (PCRE not universal) - Narrow except Exception to (OSError, SubprocessError) in find_gcov_binary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: a7afc06 Files changed: 14
Summary:
Findings1. TEMP changes must be reverted before merge (blocker)The PR body explicitly calls these out, but they're still present:
Both must be reverted before merge. 2.
|
Summary
.fppsource files (gzip JSON, 11KB, committed to repo)--only-changesflag prunes tests by intersecting PR-changed files against coverage cache--build-coverage-cacheflag + 3-phase parallel cache builder (prepare, run, gcov collect)rebuild-cacheCI job runs on Phoenix via SLURM whencases.pyor Fortran dependency graph changesuse/includestatementsALWAYS_RUN_ALLfiles trigger full suitecontinue-on-errorfrom github CI job (fixes auto-cancellation)useinm_bubbles.fpp+ removeCMakeLists.txtfromALWAYS_RUN_ALLto test the full cache rebuild pipeline in CIReplaces #1283.
Test plan
rebuild-cachejob triggers (dep_changed detection)--only-changes🤖 Generated with Claude Code