Skip to content

Add test sharding, proactive clean, and retry logic for self-hosted CI#1171

Merged
sbryngelson merged 17 commits intoMFlowCode:masterfrom
sbryngelson:ci-test
Feb 28, 2026
Merged

Add test sharding, proactive clean, and retry logic for self-hosted CI#1171
sbryngelson merged 17 commits intoMFlowCode:masterfrom
sbryngelson:ci-test

Conversation

@sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Feb 19, 2026

Summary

Hardens self-hosted CI with test sharding, retry logic, and script deduplication.

Test sharding & retry

  • Add --shard i/n flag to ./mfc.sh test — splits tests via modular arithmetic for even distribution
  • Frontier GPU matrix now runs 2 shards per interface (acc/omp), halving wall-clock time
  • Zero-test guard on both --only and --shard — empty results raise an error instead of silent green CI
  • GitHub runner tests retry up to 5 sporadic failures using tests/failed_uuids.txt
  • Abort path cleans failed_uuids.txt to prevent stale retries

--only filter improvements

  • UUIDs use OR logic (match any), labels use AND logic (match all)
  • --only matching zero tests now raises an error instead of silently passing

CI script consolidation

  • Merge submit-bench.sh into submit.sh for all 3 clusters (frontier, frontier_amd, phoenix) — submit.sh auto-detects bench vs test mode from the submitted script's basename
  • Unify frontier/ and frontier_amd/ scripts via directory-name detection — build.sh, bench.sh, submit.sh, and test.sh are now byte-identical across both directories
  • Net deletion of 3 files and ~120 lines of duplicated shell code

Other

  • Frontier test jobs use --qos=normal on batch partition (1h59m, CFD154 account)
  • --requeue on Phoenix SLURM jobs for preemption recovery
  • Build retry wrapper (3 attempts with clean between)
  • Pin nick-fields/retry to commit SHA for security on self-hosted runners
  • Lint-gate must pass before self-hosted tests run
  • Skip benchmark workflow for bot review events

Depends on: #1170

Test plan

  • Frontier GPU tests run in 2 shards per interface and complete within 2h
  • Phoenix tests pass with --requeue and preemption recovery
  • Lint-gate blocks self-hosted tests on lint failure
  • GitHub runner retry logic fires on ≤5 test failures
  • Benchmark jobs submit correctly via merged submit.sh (bench mode auto-detected)
  • frontier/ and frontier_amd/ scripts are identical and detect cluster correctly
  • --shard with zero resulting tests raises an error (not silent pass)

Copilot AI review requested due to automatic review settings February 19, 2026 20:01
@codeant-ai codeant-ai bot added the size:M This PR changes 30-99 lines, ignoring generated files label Feb 19, 2026

This comment was marked as off-topic.

coderabbitai[bot]

This comment was marked as off-topic.

cubic-dev-ai[bot]

This comment was marked as off-topic.

coderabbitai[bot]

This comment was marked as off-topic.

@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:M This PR changes 30-99 lines, ignoring generated files labels Feb 20, 2026
coderabbitai[bot]

This comment was marked as off-topic.

coderabbitai[bot]

This comment was marked as off-topic.

@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 21, 2026
@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 23, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 23, 2026
@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 24, 2026
@codeant-ai codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 24, 2026
@sbryngelson sbryngelson marked this pull request as draft February 25, 2026 01:04
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from coderabbitai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from coderabbitai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026
sbryngelson and others added 3 commits February 25, 2026 21:23
The -s check already guarantees the file is non-empty, so
NUM_FAILED > 0 is always true in that branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…zero-match guard

- Include shard in SLURM job_slug to prevent output file collisions
  between parallel shards (e.g., test-gpu-acc-1-of-2.out)
- Consolidate frontier/ and frontier_amd/ submit.sh and test.sh into
  identical scripts that derive compiler flag and config from directory
- Add $shard_opts to CPU test branch for future-proofing
- Add zero-match guard for --only filter to fail loudly instead of
  silently exiting 0 when no tests match
- Hoist failed_uuids_path to single definition at top of test()
- Compute log slug dynamically in test.yml for shard-aware filenames
- Remove unnecessary shard: '' from non-sharded matrix entries
- Replace useless cat|tr pipeline with tr < file

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The --only filter now detects whether each term is a UUID (8-char hex)
or a trace label and applies appropriate matching:
  - Labels: AND logic (--only 2D Bubbles matches tests with both)
  - UUIDs: OR logic (--only UUID1 UUID2 matches tests with either)
  - Mixed: keep case if all labels match OR any UUID matches

This preserves the documented behavior for label filtering while
correctly supporting the CI retry path that passes multiple UUIDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Feb 26, 2026
sbryngelson and others added 4 commits February 25, 2026 22:47
submit.sh now auto-detects job type (bench vs test) from the submitted
script's basename, selecting the appropriate SBATCH account, time limit,
and partition. This eliminates three submit-bench.sh files and makes
frontier/ and frontier_amd/ scripts byte-identical via directory-name
detection for compiler flags and cluster-specific options.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Raise MFCException when --shard produces zero cases (prevents
  silent green CI with nothing executed)
- Pin nick-fields/retry to commit SHA for security on self-hosted
  runners with cluster credentials

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
coderabbitai[bot]

This comment was marked as off-topic.

sbryngelson and others added 3 commits February 26, 2026 18:22
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace per-case case-optimized builds with one generic build, reducing
build time from ~34 min to ~5-10 min. Halve benchmark timesteps to
compensate for slower non-optimized runtime. Reduce GPU --mem from 12
to 4 GB. Lower test build retry timeout from 480 to 60 minutes.

Closes MFlowCode#1275

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
coderabbitai[bot]

This comment was marked as off-topic.

@codecov
Copy link

codecov bot commented Feb 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.04%. Comparing base (1412eb2) to head (73fd804).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1171      +/-   ##
==========================================
- Coverage   44.05%   44.04%   -0.02%     
==========================================
  Files          70       70              
  Lines       20496    20499       +3     
  Branches     1991     1993       +2     
==========================================
- Hits         9029     9028       -1     
- Misses      10328    10330       +2     
- Partials     1139     1141       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

2 participants