feat(client): add RetryTransport for automatic retry with exponential backoff by cchinchilla-dev · Pull Request #901 · a2aproject/a2a-python

cchinchilla-dev · 2026-03-25T21:49:32Z

Description

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Follow the CONTRIBUTING Guide.
Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
- Important Prefixes for release-please:
  - fix: which represents bug fixes, and correlates to a SemVer patch.
  - feat: represents a new feature, and correlates to a SemVer minor.
  - feat!:, or fix!:, refactor!:, etc., which represent a breaking change (indicated by the !) and will result in a SemVer major.
Ensure the tests and linter pass (Run bash scripts/format.sh from the repository root to format)
Appropriate docs were updated (if necessary)

Closes #871 🦕

Problem

The SDK's transports raise exceptions immediately on transient failures — network errors, timeouts, and server-side errors — with no built-in retry mechanism. Callers must implement their own retry logic at every call site, independently deciding which errors are retriable, implementing correct backoff, and inspecting cause chains for HTTP status codes.

Implementation

New RetryTransport decorator class in src/a2a/client/transports/retry.py that wraps any ClientTransport using the decorator pattern (following TenantTransportDecorator):

from a2a.client.transports.retry import RetryTransport

inner = JsonRpcTransport(httpx_client=client, agent_card=card)
transport = RetryTransport(
    base=inner,
    max_retries=3,
    base_delay=1.0,
    max_delay=30.0,
)

async with transport:
    result = await transport.send_message(params)

Key design decisions:

Decorator over ClientTransport, not via interceptors — ClientCallInterceptor.after() never sees exceptions, so interceptors cannot implement retry.
default_retry_predicate classifies errors by inspecting __cause__ chains: A2AClientTimeoutError (always), httpx.RequestError (network), httpx.HTTPStatusError 429/502/503/504, grpc.aio.AioRpcError UNAVAILABLE/RESOURCE_EXHAUSTED. Domain errors (TaskNotFoundError, etc.) are never retried.
Streaming retries only pre-stream failures; once the first event is yielded, errors propagate as-is.
Exponential backoff with full jitter: delay = random.uniform(0, min(base * 2^attempt, max)).
close() bypasses retry — lifecycle operation, not data exchange.
Constructor validation for max_retries, base_delay, max_delay.
Configurable retry_predicate and on_retry callback for custom logic/logging.
default_retry_predicate exported for users who want to extend it.

Non-breaking, purely additive. No changes to existing code. No new dependencies.

Tests

54 unit tests in tests/client/transports/test_retry.py covering predicate classification, retry/no-retry behavior, backoff timing, streaming edge cases, custom predicates, callbacks, and constructor validation. Includes transport-level integration tests against Starlette servers simulating transient 503s.

3 additional end-to-end tests in tests/integration/test_retry_integration.py exercising the full stack (ClientFactory → BaseClient → RetryTransport → transport → server) for both REST and JSON-RPC transports.

Happy to iterate on any of this based on maintainer feedback.

gemini-code-assist · 2026-03-25T21:49:51Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a RetryTransport decorator to the client SDK, providing a robust and configurable mechanism for automatically retrying operations that encounter transient failures. This enhancement significantly improves the client's resilience against network issues, timeouts, and temporary server-side errors, reducing the need for callers to implement custom retry logic and making the SDK more reliable out-of-the-box.

Highlights

New RetryTransport Decorator: Introduced a new RetryTransport decorator class that wraps any ClientTransport to provide automatic retry logic for transient failures.
Configurable Retry Predicate and Backoff: Implemented a default_retry_predicate to classify retriable errors (e.g., network errors, timeouts, specific HTTP status codes like 429, 502-504, gRPC UNAVAILABLE/RESOURCE_EXHAUSTED) and an exponential backoff strategy with full jitter for delay calculation. Users can provide custom predicates and on_retry callbacks.
Streaming Operation Handling: Designed streaming methods (send_message_streaming, subscribe) to only retry failures that occur before the stream starts, ensuring that errors during an active stream propagate immediately.
Comprehensive Testing: Added 54 unit tests covering predicate classification, retry/no-retry behavior, backoff timing, streaming edge cases, custom logic, and constructor validation. Additionally, 3 end-to-end integration tests were included to validate the full client stack with transient server failures.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new RetryTransport for the A2A client, enabling configurable retry logic with exponential backoff and jitter for transient errors such as timeouts, network issues, and specific HTTP 5xx status codes. It includes a default_retry_predicate to determine retriable exceptions and handles both regular and streaming operations, with streaming retries limited to pre-stream failures. Comprehensive unit and integration tests have been added to validate the retry mechanism. A suggestion was made to refactor repeated ASGI middleware logic in test_retry.py into a reusable helper function for improved maintainability.

gemini-code-assist · 2026-03-25T21:51:35Z

tests/client/transports/test_retry.py

+        async def transient_failure_app(scope, receive, send):
+            nonlocal failure_count
+            if scope['type'] == 'http' and failure_count < fail_limit:
+                failure_count += 1
+                await send(
+                    {
+                        'type': 'http.response.start',
+                        'status': 503,
+                        'headers': [
+                            [b'content-type', b'text/plain'],
+                        ],
+                    }
+                )
+                await send(
+                    {
+                        'type': 'http.response.body',
+                        'body': b'Service Unavailable',
+                    }
+                )
+                return
+            await app(scope, receive, send)


The transient_failure_app ASGI middleware is defined here and also in test_retry_with_jsonrpc_transport_recovers_from_503. Additionally, a similar middleware always_fail_app is defined in test_retry_exhaustion_with_persistent_503. To improve code reuse and maintainability, you could extract this logic into a helper function within this test module. A similar pattern is used in tests/integration/test_retry_integration.py with the _wrap_with_transient_503 helper, which could serve as a good example.

Thanks for the suggestion. The current tests in this repo use inline definitions, so I’ve kept that pattern for consistency. I’m happy to refactor if the maintainers prefer extracting them.

github-actions · 2026-03-25T21:52:17Z

🧪 Code Coverage (vs `1.0-dev`)

⬇️ Download Full Report

	Base	PR	Delta
src/a2a/client/transports/retry.py (new)	—	92.76%	—
Total	91.47%	91.49%	🟢 +0.03%

Generated by coverage-comment.yml

feat(client): add RetryTransport

eb2016b

cchinchilla-dev requested a review from a team as a code owner March 25, 2026 21:49

gemini-code-assist bot reviewed Mar 25, 2026

View reviewed changes

fix: rename retriable to retryable for spell checker compliance

4102f20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(client): add RetryTransport for automatic retry with exponential backoff#901

feat(client): add RetryTransport for automatic retry with exponential backoff#901
cchinchilla-dev wants to merge 2 commits intoa2aproject:1.0-devfrom
cchinchilla-dev:feat/retry-transport-871

cchinchilla-dev commented Mar 25, 2026

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 25, 2026

Uh oh!

cchinchilla-dev Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cchinchilla-dev commented Mar 25, 2026

Description

Problem

Implementation

Tests

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

cchinchilla-dev Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 Code Coverage (vs 1.0-dev)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 25, 2026 •

edited

Loading

🧪 Code Coverage (vs `1.0-dev`)