Fix flaky test 02844_max_backup_bandwidth_s3#95127
Conversation
The test was flaky because it used absolute timing thresholds (>= 7 seconds) which are unreliable due to variable S3 latency and system performance. The fix changes to a relative timing comparison: - Run both backup operations (with and without native copy) - Compare their durations instead of checking absolute times - Verify that non-native copy takes at least 10 seconds longer than native copy This ensures the test validates that bandwidth limiting works (applied only to non-native copy) without being affected by S3 latency variations. Also increased data size from 8MB to 16MB for more pronounced timing differences. Closes #85084 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
This change doesn't make any sense to me. Should we just revert it? @vitlibar WDYT? |
|
The timing check just verifies the TBF algorithm and that the cumulative sleep time reflects the number of bytes passed through the throttler, there should not be any dependency on s3 reply times or whatever. I think for some reason 1e8 records is not always passed through the throttler (this is the only explanation I have). But I dont know why this is the case. We could try to add logging into the throttler to check how many bytes is actually passing it... |
|
@serxa, but if we revert this change, the test will become flaky again. I can remove the test. |
|
Is it okay if we add logging and revert? When if fails, we could analyze better what is going on? otherwise I dont know how to reproduce the issue, it is rare... |
|
Sounds ok. |
The test was flaky because it used absolute timing thresholds (>= 7 seconds) which are unreliable due to variable S3 latency and system performance.
The fix changes to a relative timing comparison:
This ensures the test validates that bandwidth limiting works (applied only to non-native copy) without being affected by S3 latency variations.
Also increased data size from 8MB to 16MB for more pronounced timing differences.
Closes #85084
Changelog category (leave one):
See https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=95118&sha=ab1adaa5d88f38a4934fac9fc8ffb91e97ef5a6f&name_0=PR&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29
Note: if it continues, we will remove the timing check from the test entirely.