Add test sharding, proactive clean, and retry logic for self-hosted CI by sbryngelson · Pull Request #1171 · MFlowCode/MFC

sbryngelson · 2026-02-19T20:01:54Z

Summary

Hardens self-hosted CI with test sharding, retry logic, and script deduplication.

Test sharding & retry

Add --shard i/n flag to ./mfc.sh test — splits tests via modular arithmetic for even distribution
Frontier GPU matrix now runs 2 shards per interface (acc/omp), halving wall-clock time
Zero-test guard on both --only and --shard — empty results raise an error instead of silent green CI
GitHub runner tests retry up to 5 sporadic failures using tests/failed_uuids.txt
Abort path cleans failed_uuids.txt to prevent stale retries

`--only` filter improvements

UUIDs use OR logic (match any), labels use AND logic (match all)
--only matching zero tests now raises an error instead of silently passing

CI script consolidation

Merge submit-bench.sh into submit.sh for all 3 clusters (frontier, frontier_amd, phoenix) — submit.sh auto-detects bench vs test mode from the submitted script's basename
Unify frontier/ and frontier_amd/ scripts via directory-name detection — build.sh, bench.sh, submit.sh, and test.sh are now byte-identical across both directories
Net deletion of 3 files and ~120 lines of duplicated shell code

Other

Frontier test jobs use --qos=normal on batch partition (1h59m, CFD154 account)
--requeue on Phoenix SLURM jobs for preemption recovery
Build retry wrapper (3 attempts with clean between)
Pin nick-fields/retry to commit SHA for security on self-hosted runners
Lint-gate must pass before self-hosted tests run
Skip benchmark workflow for bot review events

Depends on: #1170

Test plan

Frontier GPU tests run in 2 shards per interface and complete within 2h
Phoenix tests pass with --requeue and preemption recovery
Lint-gate blocks self-hosted tests on lint failure
GitHub runner retry logic fires on ≤5 test failures
Benchmark jobs submit correctly via merged submit.sh (bench mode auto-detected)
frontier/ and frontier_amd/ scripts are identical and detect cluster correctly
--shard with zero resulting tests raises an error (not silent pass)

The -s check already guarantees the file is non-empty, so NUM_FAILED > 0 is always true in that branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…zero-match guard - Include shard in SLURM job_slug to prevent output file collisions between parallel shards (e.g., test-gpu-acc-1-of-2.out) - Consolidate frontier/ and frontier_amd/ submit.sh and test.sh into identical scripts that derive compiler flag and config from directory - Add $shard_opts to CPU test branch for future-proofing - Add zero-match guard for --only filter to fail loudly instead of silently exiting 0 when no tests match - Hoist failed_uuids_path to single definition at top of test() - Compute log slug dynamically in test.yml for shard-aware filenames - Remove unnecessary shard: '' from non-sharded matrix entries - Replace useless cat|tr pipeline with tr < file Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The --only filter now detects whether each term is a UUID (8-char hex) or a trace label and applies appropriate matching: - Labels: AND logic (--only 2D Bubbles matches tests with both) - UUIDs: OR logic (--only UUID1 UUID2 matches tests with either) - Mixed: keep case if all labels match OR any UUID matches This preserves the documented behavior for label filtering while correctly supporting the CI retry path that passes multiple UUIDs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

submit.sh now auto-detects job type (bench vs test) from the submitted script's basename, selecting the appropriate SBATCH account, time limit, and partition. This eliminates three submit-bench.sh files and makes frontier/ and frontier_amd/ scripts byte-identical via directory-name detection for compiler flags and cluster-specific options. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Raise MFCException when --shard produces zero cases (prevents silent green CI with nothing executed) - Pin nick-fields/retry to commit SHA for security on self-hosted runners with cluster credentials Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace per-case case-optimized builds with one generic build, reducing build time from ~34 min to ~5-10 min. Halve benchmark timesteps to compensate for slower non-optimized runtime. Reduce GPU --mem from 12 to 4 GB. Lower test build retry timeout from 480 to 60 minutes. Closes MFlowCode#1275 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-02-28T15:19:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.04%. Comparing base (1412eb2) to head (73fd804).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1171      +/-   ##
==========================================
- Coverage   44.05%   44.04%   -0.02%     
==========================================
  Files          70       70              
  Lines       20496    20499       +3     
  Branches     1991     1993       +2     
==========================================
- Hits         9029     9028       -1     
- Misses      10328    10330       +2     
- Partials     1139     1141       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings February 19, 2026 20:01

Copilot started reviewing on behalf of sbryngelson February 19, 2026 20:02 View session

codeant-ai bot added the size:M This PR changes 30-99 lines, ignoring generated files label Feb 19, 2026

This comment was marked as off-topic.

Sign in to view

codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:M This PR changes 30-99 lines, ignoring generated files labels Feb 20, 2026

This comment was marked as off-topic.

Sign in to view

codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 21, 2026

sbryngelson force-pushed the ci-test branch from 55b68e5 to 491b27b Compare February 23, 2026 14:50

codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 23, 2026

MFlowCode deleted a comment from github-actions bot Feb 23, 2026

sbryngelson force-pushed the ci-test branch from 3ce4f39 to f3bab46 Compare February 24, 2026 16:03

codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 24, 2026

sbryngelson force-pushed the ci-test branch from 749eb67 to a9b1e40 Compare February 24, 2026 16:49

codeant-ai bot added size:L This PR changes 100-499 lines, ignoring generated files and removed size:L This PR changes 100-499 lines, ignoring generated files labels Feb 24, 2026

sbryngelson marked this pull request as draft February 25, 2026 01:04

MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026

MFlowCode deleted a comment from coderabbitai bot Feb 26, 2026

MFlowCode deleted a comment from codeant-ai bot Feb 26, 2026

sbryngelson and others added 3 commits February 25, 2026 21:23

Remove redundant NUM_FAILED > 0 guard in test retry logic

b5c095f

The -s check already guarantees the file is non-empty, so NUM_FAILED > 0 is always true in that branch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MFlowCode deleted a comment from github-actions bot Feb 26, 2026

sbryngelson and others added 4 commits February 25, 2026 22:47

Use normal QOS instead of hackathon for Frontier test jobs

a1c55ed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'master' into ci-test

8b48e30

This comment was marked as off-topic.

Sign in to view

sbryngelson and others added 3 commits February 26, 2026 18:22

Trigger CI

46dcd73

Rename ambiguous single-letter variable l to label in _filter_only

a2431bf

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This comment was marked as off-topic.

Sign in to view

sbryngelson mentioned this pull request Feb 28, 2026

Separate case-optimization correctness testing from performance benchmarks #1275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test sharding, proactive clean, and retry logic for self-hosted CI#1171

Add test sharding, proactive clean, and retry logic for self-hosted CI#1171
sbryngelson merged 17 commits intoMFlowCode:masterfrom
sbryngelson:ci-test

sbryngelson commented Feb 19, 2026 •

edited

Loading

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

codecov bot commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

sbryngelson commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test sharding & retry

--only filter improvements

CI script consolidation

Other

Test plan

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

codecov bot commented Feb 28, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

sbryngelson commented Feb 19, 2026 •

edited

Loading

`--only` filter improvements