Skip to content

Fix flaky test 02844_max_backup_bandwidth_s3#95127

Merged
alexey-milovidov merged 1 commit intomasterfrom
fix-flaky-test-02844-max-backup-bandwidth-s3
Jan 26, 2026
Merged

Fix flaky test 02844_max_backup_bandwidth_s3#95127
alexey-milovidov merged 1 commit intomasterfrom
fix-flaky-test-02844-max-backup-bandwidth-s3

Conversation

@alexey-milovidov
Copy link
Member

@alexey-milovidov alexey-milovidov commented Jan 26, 2026

The test was flaky because it used absolute timing thresholds (>= 7 seconds) which are unreliable due to variable S3 latency and system performance.

The fix changes to a relative timing comparison:

  • Run both backup operations (with and without native copy)
  • Compare their durations instead of checking absolute times
  • Verify that non-native copy takes at least 10 seconds longer than native copy

This ensures the test validates that bandwidth limiting works (applied only to non-native copy) without being affected by S3 latency variations.

Also increased data size from 8MB to 16MB for more pronounced timing differences.

Closes #85084

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

See https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=95118&sha=ab1adaa5d88f38a4934fac9fc8ffb91e97ef5a6f&name_0=PR&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29

Note: if it continues, we will remove the timing check from the test entirely.

The test was flaky because it used absolute timing thresholds (>= 7 seconds)
which are unreliable due to variable S3 latency and system performance.

The fix changes to a relative timing comparison:
- Run both backup operations (with and without native copy)
- Compare their durations instead of checking absolute times
- Verify that non-native copy takes at least 10 seconds longer than native copy

This ensures the test validates that bandwidth limiting works (applied only
to non-native copy) without being affected by S3 latency variations.

Also increased data size from 8MB to 16MB for more pronounced timing differences.

Closes #85084

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Jan 26, 2026

Workflow [PR], commit [5ac2797]

Summary:

@clickhouse-gh clickhouse-gh bot added the pr-ci label Jan 26, 2026
@alexey-milovidov alexey-milovidov self-assigned this Jan 26, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jan 26, 2026
Merged via the queue into master with commit 5364a37 Jan 26, 2026
134 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-flaky-test-02844-max-backup-bandwidth-s3 branch January 26, 2026 13:23
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jan 26, 2026
@serxa
Copy link
Member

serxa commented Mar 18, 2026

This change doesn't make any sense to me. Should we just revert it? @vitlibar WDYT?

@serxa
Copy link
Member

serxa commented Mar 18, 2026

The timing check just verifies the TBF algorithm and that the cumulative sleep time reflects the number of bytes passed through the throttler, there should not be any dependency on s3 reply times or whatever. I think for some reason 1e8 records is not always passed through the throttler (this is the only explanation I have). But I dont know why this is the case. We could try to add logging into the throttler to check how many bytes is actually passing it...

@alexey-milovidov
Copy link
Member Author

@serxa, but if we revert this change, the test will become flaky again. I can remove the test.

@serxa
Copy link
Member

serxa commented Mar 18, 2026

Is it okay if we add logging and revert? When if fails, we could analyze better what is going on? otherwise I dont know how to reproduce the issue, it is rare...

@alexey-milovidov
Copy link
Member Author

Sounds ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

02844_max_backup_bandwidth_s3 is flaky

3 participants