Skip to content

Fix retries when uploading databases#3564

Merged
henrymercer merged 6 commits intomainfrom
henrymercer/fix-database-upload-retries
Mar 10, 2026
Merged

Fix retries when uploading databases#3564
henrymercer merged 6 commits intomainfrom
henrymercer/fix-database-upload-retries

Conversation

@henrymercer
Copy link
Contributor

The current retry mechanism would pass a read stream that might have already been partially consumed. Instead, use a custom retry mechanism and pass a fresh read stream on each attempt.

An alternative that wouldn't need the custom retry mechanism would have been to read the database into a buffer, but databases might be multiple GBs so I wanted to keep a streaming approach.

Risk assessment

For internal use only. Please select the risk level of this change:

  • High risk: Changes are not fully under feature flags, have limited visibility and/or cannot be tested outside of production.

Which use cases does this change impact?

Workflow types:

  • Advanced setup - Impacts users who have custom CodeQL workflows.
  • Managed - Impacts users with dynamic workflows (Default Setup, Code Quality, ...).

Products:

  • Code Scanning - The changes impact analyses when analysis-kinds: code-scanning.

Environments:

  • Dotcom - Impacts CodeQL workflows on github.com and/or GitHub Enterprise Cloud with Data Residency.
  • GHES - Impacts CodeQL workflows on GitHub Enterprise Server.

How did/will you validate this change?

  • Unit tests - I am depending on unit test coverage (i.e. tests in .test.ts files).

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

  • Rollback - Change can only be disabled by rolling back the release or releasing a new version with a fix.

How will you know if something goes wrong after this change is released?

  • Telemetry - I rely on existing telemetry or have made changes to the telemetry.
    • Alerts - New or existing monitors will trip if something goes wrong with this change.

Are there any special considerations for merging or releasing this change?

  • No special considerations - This change can be merged at any time.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Consider adding a changelog entry for this change.
  • Confirm the readme and docs have been updated if necessary.

@henrymercer henrymercer requested a review from a team as a code owner March 10, 2026 12:53
Copilot AI review requested due to automatic review settings March 10, 2026 12:53
@github-actions github-actions bot added the size/S Should be easy to review label Mar 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes database upload retries by avoiding reuse of a partially-consumed ReadStream across retry attempts, keeping uploads streamed (rather than buffering multi-GB artifacts) while ensuring each attempt uses a fresh stream.

Changes:

  • Implement manual retry logic for database uploads with a fresh fs.createReadStream per attempt and disable Octokit per-request retries for that call.
  • Centralize “do not retry” HTTP statuses in src/api-client.ts for reuse across the codebase.
  • Update/add unit tests and document the fix in CHANGELOG.md.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/database-upload.ts Replaces Octokit retry behavior for database uploads with a manual retry loop that recreates the upload stream per attempt.
src/database-upload.test.ts Updates tests to distinguish retryable vs non-retryable failures and avoids real sleep delays during backoff.
src/api-client.ts Exports DO_NOT_RETRY_STATUSES and reuses it in Octokit retry configuration.
src/api-client.test.ts Updates expected retry configuration to reference DO_NOT_RETRY_STATUSES.
CHANGELOG.md Adds an UNRELEASED entry describing the database upload retry fix.
lib/* Generated JavaScript output updated to match TypeScript changes (not reviewed).

@esbena esbena self-requested a review March 10, 2026 13:04
Copy link
Contributor

@esbena esbena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with some minor insight improvement suggestions.

Comment on lines +150 to +151
if (!isRetryable || attempt === maxAttempts) {
throw e;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor debugging improvement.

Suggested change
if (!isRetryable || attempt === maxAttempts) {
throw e;
if (!isRetryable) {
throw e;
} else if (attempt === maxAttempts) {
logger.error(`Maximum retry attempts exhausted (${attempt}), re-throwing otherwise retryable error...`)
throw e;

language,
zipped_upload_size_bytes: bundledDbSize,
is_overlay_base: shouldUploadOverlayBase,
upload_duration_ms: uploadDurationMs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder: Do we separately and in general track how many failures we have for API requests that succeed on retry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, but that's a good idea, although for another PR :)

esbena
esbena previously approved these changes Mar 10, 2026
Copy link
Member

@mbg mbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broadly looks good and makes sense. I have a few suggestions, particularly around the "high risk" assessment. I'd suggest we add a FF just in case, which shouldn't be too hard? Otherwise, we probably do want to make sure that we have reasonable test coverage here.

Comment on lines +118 to +121
const maxAttempts = 4; // 1 initial attempt + 3 retries, identical to the default retry behavior of Octokit
let uploadDurationMs: number | undefined;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
const bundledDbReadStream = fs.createReadStream(bundledDb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you marked this PR as high risk. It doesn't seem like it would be too hard to refactor this slightly so that the old code / loop body goes into a new function and then depending on a new FF we either just call that directly or in the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied to this as part of your overall review comment

bundledDbReadStream.close();
);
uploadDurationMs = performance.now() - attemptStartTime;
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: maybe a mild preference for using some boolean to indicate when the upload was successful in the loop condition over using break, since that can be brittle if we refactor this in the future. That might not matter / be necessary if you move the upload code into a separate function as suggested in my other comment, though.

Copy link
Contributor

@robertbrignull robertbrignull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Thanks for doing this so quickly! I'll leave the actual approval to others.

@henrymercer
Copy link
Contributor Author

This broadly looks good and makes sense. I have a few suggestions, particularly around the "high risk" assessment. I'd suggest we add a FF just in case, which shouldn't be too hard? Otherwise, we probably do want to make sure that we have reasonable test coverage here.

I disagree — we hit this error very rarely, and the problem is remedied by a retry. I don't think we'd get much signal with a feature flag. The changes require careful review, but I think a feature flag is overkill. I'd suggest instead we check the telemetry in a month's time and make sure we're no longer seeing "Response body object should not be disturbed or locked".

@github-actions github-actions bot added size/M Should be of average difficulty to review and removed size/S Should be easy to review labels Mar 10, 2026
@henrymercer henrymercer requested a review from mbg March 10, 2026 15:53
mbg
mbg previously approved these changes Mar 10, 2026
Copy link
Member

@mbg mbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes since my last review look good, thanks!

I see that you reacted with 👍🏻 to this comment, but I don't see it addressed yet -- did you mean to?

I disagree — we hit this error very rarely, and the problem is remedied by a retry. I don't think we'd get much signal with a feature flag.

I don't necessarily agree with this line of reasoning, because the upload code itself gets hit by most analyses and any changes to it will affect them all. Whether or not the upload needs to be retried is not relevant to the reasoning for/against a FF. That said, I don't feel strongly enough about this to be blocking in this instance since we have test coverage which shows that the changes to the code behave as expected in the "no retry" case. Adding a FF also introduces additional changes and so there's an argument that may outweigh any benefit here.

@henrymercer
Copy link
Contributor Author

I see that you reacted with 👍🏻 to this comment, but I don't see it addressed yet -- did you mean to?

Thanks, pushed.

@henrymercer henrymercer enabled auto-merge March 10, 2026 16:44
@henrymercer henrymercer added this pull request to the merge queue Mar 10, 2026
Merged via the queue into main with commit 5cb13d6 Mar 10, 2026
244 checks passed
@henrymercer henrymercer deleted the henrymercer/fix-database-upload-retries branch March 10, 2026 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Should be of average difficulty to review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants