Fix retries when uploading databases by henrymercer · Pull Request #3564 · github/codeql-action

henrymercer · 2026-03-10T12:53:56Z

The current retry mechanism would pass a read stream that might have already been partially consumed. Instead, use a custom retry mechanism and pass a fresh read stream on each attempt.

An alternative that wouldn't need the custom retry mechanism would have been to read the database into a buffer, but databases might be multiple GBs so I wanted to keep a streaming approach.

Risk assessment

For internal use only. Please select the risk level of this change:

High risk: Changes are not fully under feature flags, have limited visibility and/or cannot be tested outside of production.

Which use cases does this change impact?

Workflow types:

Advanced setup - Impacts users who have custom CodeQL workflows.
Managed - Impacts users with dynamic workflows (Default Setup, Code Quality, ...).

Products:

Code Scanning - The changes impact analyses when analysis-kinds: code-scanning.

Environments:

Dotcom - Impacts CodeQL workflows on github.com and/or GitHub Enterprise Cloud with Data Residency.
GHES - Impacts CodeQL workflows on GitHub Enterprise Server.

How did/will you validate this change?

Unit tests - I am depending on unit test coverage (i.e. tests in .test.ts files).

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

Rollback - Change can only be disabled by rolling back the release or releasing a new version with a fix.

How will you know if something goes wrong after this change is released?

Telemetry - I rely on existing telemetry or have made changes to the telemetry.
- Alerts - New or existing monitors will trip if something goes wrong with this change.

Are there any special considerations for merging or releasing this change?

No special considerations - This change can be merged at any time.

Merge / deployment checklist

Confirm this change is backwards compatible with existing workflows.
Consider adding a changelog entry for this change.
Confirm the readme and docs have been updated if necessary.

Copilot

Pull request overview

This PR fixes database upload retries by avoiding reuse of a partially-consumed ReadStream across retry attempts, keeping uploads streamed (rather than buffering multi-GB artifacts) while ensuring each attempt uses a fresh stream.

Changes:

Implement manual retry logic for database uploads with a fresh fs.createReadStream per attempt and disable Octokit per-request retries for that call.
Centralize “do not retry” HTTP statuses in src/api-client.ts for reuse across the codebase.
Update/add unit tests and document the fix in CHANGELOG.md.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/database-upload.ts	Replaces Octokit retry behavior for database uploads with a manual retry loop that recreates the upload stream per attempt.
src/database-upload.test.ts	Updates tests to distinguish retryable vs non-retryable failures and avoids real sleep delays during backoff.
src/api-client.ts	Exports `DO_NOT_RETRY_STATUSES` and reuses it in Octokit retry configuration.
src/api-client.test.ts	Updates expected retry configuration to reference `DO_NOT_RETRY_STATUSES`.
CHANGELOG.md	Adds an UNRELEASED entry describing the database upload retry fix.
lib/*	Generated JavaScript output updated to match TypeScript changes (not reviewed).

src/database-upload.ts

src/database-upload.test.ts

esbena

LGTM, with some minor insight improvement suggestions.

esbena · 2026-03-10T13:09:05Z

src/database-upload.ts

+          if (!isRetryable || attempt === maxAttempts) {
+            throw e;


Minor debugging improvement.

Suggested change

if (!isRetryable || attempt === maxAttempts) {

throw e;

if (!isRetryable) {

throw e;

} else if (attempt === maxAttempts) {

logger.error(`Maximum retry attempts exhausted (${attempt}), re-throwing otherwise retryable error...`)

throw e;

esbena · 2026-03-10T13:10:38Z

src/database-upload.ts

+        language,
+        zipped_upload_size_bytes: bundledDbSize,
+        is_overlay_base: shouldUploadOverlayBase,
+        upload_duration_ms: uploadDurationMs,


I wonder: Do we separately and in general track how many failures we have for API requests that succeed on retry?

We don't, but that's a good idea, although for another PR :)

mbg

This broadly looks good and makes sense. I have a few suggestions, particularly around the "high risk" assessment. I'd suggest we add a FF just in case, which shouldn't be too hard? Otherwise, we probably do want to make sure that we have reasonable test coverage here.

src/database-upload.ts

mbg · 2026-03-10T13:09:09Z

src/database-upload.ts

+      const maxAttempts = 4; // 1 initial attempt + 3 retries, identical to the default retry behavior of Octokit
+      let uploadDurationMs: number | undefined;
+      for (let attempt = 1; attempt <= maxAttempts; attempt++) {
+        const bundledDbReadStream = fs.createReadStream(bundledDb);


I see that you marked this PR as high risk. It doesn't seem like it would be too hard to refactor this slightly so that the old code / loop body goes into a new function and then depending on a new FF we either just call that directly or in the loop?

I replied to this as part of your overall review comment

mbg · 2026-03-10T13:12:19Z

src/database-upload.ts

-        bundledDbReadStream.close();
+          );
+          uploadDurationMs = performance.now() - attemptStartTime;
+          break;


Minor: maybe a mild preference for using some boolean to indicate when the upload was successful in the loop condition over using break, since that can be brittle if we refactor this in the future. That might not matter / be necessary if you move the upload code into a separate function as suggested in my other comment, though.

src/database-upload.test.ts

robertbrignull

Makes sense to me. Thanks for doing this so quickly! I'll leave the actual approval to others.

henrymercer · 2026-03-10T14:49:01Z

This broadly looks good and makes sense. I have a few suggestions, particularly around the "high risk" assessment. I'd suggest we add a FF just in case, which shouldn't be too hard? Otherwise, we probably do want to make sure that we have reasonable test coverage here.

I disagree — we hit this error very rarely, and the problem is remedied by a retry. I don't think we'd get much signal with a feature flag. The changes require careful review, but I think a feature flag is overkill. I'd suggest instead we check the telemetry in a month's time and make sure we're no longer seeing "Response body object should not be disturbed or locked".

mbg

The changes since my last review look good, thanks!

I see that you reacted with 👍🏻 to this comment, but I don't see it addressed yet -- did you mean to?

I disagree — we hit this error very rarely, and the problem is remedied by a retry. I don't think we'd get much signal with a feature flag.

I don't necessarily agree with this line of reasoning, because the upload code itself gets hit by most analyses and any changes to it will affect them all. Whether or not the upload needs to be retried is not relevant to the reasoning for/against a FF. That said, I don't feel strongly enough about this to be blocking in this instance since we have test coverage which shows that the changes to the code behave as expected in the "no retry" case. Adding a FF also introduces additional changes and so there's an argument that may outweigh any benefit here.

henrymercer · 2026-03-10T16:36:11Z

I see that you reacted with 👍🏻 to this comment, but I don't see it addressed yet -- did you mean to?

Thanks, pushed.

henrymercer added 3 commits March 10, 2026 12:34

Fix retries when uploading databases

13c5489

Add changelog note

ca969a9

Update tests

edfcb0a

henrymercer requested a review from a team as a code owner March 10, 2026 12:53

Copilot AI review requested due to automatic review settings March 10, 2026 12:53

github-actions bot added the size/S Should be easy to review label Mar 10, 2026

Copilot started reviewing on behalf of henrymercer March 10, 2026 12:54 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

src/database-upload.ts Outdated Show resolved Hide resolved

src/database-upload.test.ts Outdated Show resolved Hide resolved

esbena self-requested a review March 10, 2026 13:04

esbena reviewed Mar 10, 2026

View reviewed changes

esbena previously approved these changes Mar 10, 2026

View reviewed changes

mbg requested changes Mar 10, 2026

View reviewed changes

robertbrignull reviewed Mar 10, 2026

View reviewed changes

henrymercer added 2 commits March 10, 2026 15:51

Address review comments

ee5ede7

Update database upload tests to use checkExpectedLogMessages

cf972cd

henrymercer dismissed esbena’s stale review via cf972cd March 10, 2026 15:52

github-actions bot added size/M Should be of average difficulty to review and removed size/S Should be easy to review labels Mar 10, 2026

henrymercer requested a review from mbg March 10, 2026 15:53

mbg previously approved these changes Mar 10, 2026

View reviewed changes

Refactor: Extract separate function for uploadBundledDatabase

a63886b

henrymercer dismissed mbg’s stale review via a63886b March 10, 2026 16:36

mbg approved these changes Mar 10, 2026

View reviewed changes

henrymercer enabled auto-merge March 10, 2026 16:44

henrymercer added this pull request to the merge queue Mar 10, 2026

Merged via the queue into main with commit 5cb13d6 Mar 10, 2026
244 checks passed

henrymercer deleted the henrymercer/fix-database-upload-retries branch March 10, 2026 17:17

Conversation

henrymercer commented Mar 10, 2026

Risk assessment

Which use cases does this change impact?

How did/will you validate this change?

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

How will you know if something goes wrong after this change is released?

Are there any special considerations for merging or releasing this change?

Merge / deployment checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

esbena left a comment

Choose a reason for hiding this comment

Uh oh!

esbena Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

esbena Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

henrymercer Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

mbg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mbg Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

henrymercer Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

mbg Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

robertbrignull left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henrymercer commented Mar 10, 2026

Uh oh!

mbg left a comment

Choose a reason for hiding this comment

Uh oh!

henrymercer commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

robertbrignull left a comment •

edited

Loading