[python] Add FollowUpScanner, IncrementalDiffScanner, sharding#7348
Merged
JingsongLi merged 5 commits intoapache:masterfrom Mar 11, 2026
Merged
[python] Add FollowUpScanner, IncrementalDiffScanner, sharding#7348JingsongLi merged 5 commits intoapache:masterfrom
JingsongLi merged 5 commits intoapache:masterfrom
Conversation
2 tasks
This was referenced Mar 5, 2026
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…olidate tests, fix parallelism - Collapse repetitive module/class/method docstrings to one-liners in all scanner files (follow_up_scanner, delta, changelog, incremental_diff) - Remove TDD process commentary from test docstrings - Consolidate DeltaFollowUpScanner false-case tests into one parameterized test - Remove misleading commit_kind from ChangelogFollowUpScanner test mocks - Extract duplicated mock helpers to module-level functions - Fix max(8, ...) parallelism bug: respect user-configured parallelism - Remove obvious/redundant inline comments - Standardize license headers to comment style, merge double docstrings - Add clarifying docstring to ManifestListManager.read_all Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…olidate tests, fix parallelism - Collapse repetitive module/class/method docstrings to one-liners in all scanner files (follow_up_scanner, delta, changelog, incremental_diff) - Remove TDD process commentary from test docstrings - Consolidate DeltaFollowUpScanner false-case tests into one parameterized test - Remove misleading commit_kind from ChangelogFollowUpScanner test mocks - Extract duplicated mock helpers to module-level functions - Fix max(8, ...) parallelism bug: respect user-configured parallelism - Remove obvious/redundant inline comments - Standardize license headers to comment style, merge double docstrings - Add clarifying docstring to ManifestListManager.read_all Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…olidate tests, fix parallelism - Collapse repetitive module/class/method docstrings to one-liners in all scanner files (follow_up_scanner, delta, changelog, incremental_diff) - Remove TDD process commentary from test docstrings - Consolidate DeltaFollowUpScanner false-case tests into one parameterized test - Remove misleading commit_kind from ChangelogFollowUpScanner test mocks - Extract duplicated mock helpers to module-level functions - Fix max(8, ...) parallelism bug: respect user-configured parallelism - Remove obvious/redundant inline comments - Standardize license headers to comment style, merge double docstrings - Add clarifying docstring to ManifestListManager.read_all Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 6, 2026
…olidate tests, fix parallelism - Collapse repetitive module/class/method docstrings to one-liners in all scanner files (follow_up_scanner, delta, changelog, incremental_diff) - Remove TDD process commentary from test docstrings - Consolidate DeltaFollowUpScanner false-case tests into one parameterized test - Remove misleading commit_kind from ChangelogFollowUpScanner test mocks - Extract duplicated mock helpers to module-level functions - Fix max(8, ...) parallelism bug: respect user-configured parallelism - Remove obvious/redundant inline comments - Standardize license headers to comment style, merge double docstrings - Add clarifying docstring to ManifestListManager.read_all Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 9, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 9, 2026
…olidate tests, fix parallelism - Collapse repetitive module/class/method docstrings to one-liners in all scanner files (follow_up_scanner, delta, changelog, incremental_diff) - Remove TDD process commentary from test docstrings - Consolidate DeltaFollowUpScanner false-case tests into one parameterized test - Remove misleading commit_kind from ChangelogFollowUpScanner test mocks - Extract duplicated mock helpers to module-level functions - Fix max(8, ...) parallelism bug: respect user-configured parallelism - Remove obvious/redundant inline comments - Standardize license headers to comment style, merge double docstrings - Add clarifying docstring to ManifestListManager.read_all Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a3f6a2b to
fe599ca
Compare
- Add FollowUpScanner hierarchy (base, delta, changelog) - Add IncrementalDiffScanner for diff-based streaming reads - Add sharding support to FileScanner - Add row kind support to TableRead for changelog streams Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…olidate tests, fix parallelism - Collapse repetitive module/class/method docstrings to one-liners in all scanner files (follow_up_scanner, delta, changelog, incremental_diff) - Remove TDD process commentary from test docstrings - Consolidate DeltaFollowUpScanner false-case tests into one parameterized test - Remove misleading commit_kind from ChangelogFollowUpScanner test mocks - Extract duplicated mock helpers to module-level functions - Fix max(8, ...) parallelism bug: respect user-configured parallelism - Remove obvious/redundant inline comments - Standardize license headers to comment style, merge double docstrings - Add clarifying docstring to ManifestListManager.read_all Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2da01d8 to
fdaed61
Compare
Move the include_row_kind feature out of this PR into a separate branch (python-streaming-1b2-row-kind) to keep the scanners PR focused on scanners and sharding only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 10, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 10, 2026
…olidate tests, fix parallelism - Collapse repetitive module/class/method docstrings to one-liners in all scanner files (follow_up_scanner, delta, changelog, incremental_diff) - Remove TDD process commentary from test docstrings - Consolidate DeltaFollowUpScanner false-case tests into one parameterized test - Remove misleading commit_kind from ChangelogFollowUpScanner test mocks - Extract duplicated mock helpers to module-level functions - Fix max(8, ...) parallelism bug: respect user-configured parallelism - Remove obvious/redundant inline comments - Standardize license headers to comment style, merge double docstrings - Add clarifying docstring to ManifestListManager.read_all Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3 tasks
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 11, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 11, 2026
…olidate tests, fix parallelism - Collapse repetitive module/class/method docstrings to one-liners in all scanner files (follow_up_scanner, delta, changelog, incremental_diff) - Remove TDD process commentary from test docstrings - Consolidate DeltaFollowUpScanner false-case tests into one parameterized test - Remove misleading commit_kind from ChangelogFollowUpScanner test mocks - Extract duplicated mock helpers to module-level functions - Fix max(8, ...) parallelism bug: respect user-configured parallelism - Remove obvious/redundant inline comments - Standardize license headers to comment style, merge double docstrings - Add clarifying docstring to ManifestListManager.read_all Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 11, 2026
…rim docs, remove ChangelogProducer - Upgrade cachetools to >=7,<8 for cachedmethod(info=True) support - Remove ChangelogProducer enum (belongs in apache#7348 scanners branch) - Replace manual cache hit/miss counters with @cachedmethod(info=True) decorator on ManifestFileManager, ManifestListManager, SnapshotManager - Trim verbose docstrings across identifier, file_io, pyarrow_file_io, manifest_list_manager, and snapshot_manager - Update cache tests to use cache_info() instead of manual counters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
tub
added a commit
to tub/paimon
that referenced
this pull request
Mar 11, 2026
…olidate tests, fix parallelism - Collapse repetitive module/class/method docstrings to one-liners in all scanner files (follow_up_scanner, delta, changelog, incremental_diff) - Remove TDD process commentary from test docstrings - Consolidate DeltaFollowUpScanner false-case tests into one parameterized test - Remove misleading commit_kind from ChangelogFollowUpScanner test mocks - Extract duplicated mock helpers to module-level functions - Fix max(8, ...) parallelism bug: respect user-configured parallelism - Remove obvious/redundant inline comments - Standardize license headers to comment style, merge double docstrings - Add clarifying docstring to ManifestListManager.read_all Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FollowUpScannerhierarchy (base, delta, changelog) for streaming scan planningIncrementalDiffScannerfor diff-based streaming readsFileScannerStacked PR series
This is PR 1b/5 in the Python streaming read series:
AsyncStreamingTableScan)paimon tail)Incremental diff (vs 1a): tub/paimon@python-streaming-1a-caching...tub:paimon:python-streaming-1b-scanners (or wait until 1a is merged & compare)
Test plan
flake8passes on all changed filespython -m pytestpassesfollow_up_scanner_test.py,changelog_follow_up_scanner_test.py,incremental_diff_scanner_test.py