Skip to content

add setting to define filename pattern for part exports#1512

Merged
zvonand merged 8 commits intoantalya-25.8from
port_258_export_filename_pattern
Mar 17, 2026
Merged

add setting to define filename pattern for part exports#1512
zvonand merged 8 commits intoantalya-25.8from
port_258_export_filename_pattern

Conversation

@arthurpassos
Copy link
Collaborator

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add setting to define filename pattern for part exports - helps with sharding

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Mar 10, 2026

Workflow [PR], commit [ed45b16]

@arthurpassos arthurpassos changed the title port add setting to define filename pattern for part exports Mar 10, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8163fb0b93

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


try
{
const auto filename = buildDestinationFilename(manifest, destination_storage_id, local_context);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use source table macros when building export filename

The new filename expansion passes destination_storage_id into buildDestinationFilename, so the {shard}/{replica} lookup checks the destination database instead of the source table’s database. When exporting from a DatabaseReplicated source to a regular S3 table (the common path), macro_info.shard and macro_info.replica stay unset and expansion falls back to server-config macros, which can raise NO_ELEMENTS_IN_CONFIG or inject wrong values. This means the newly added pattern feature does not work for the intended sharded/replicated source scenario.

Useful? React with 👍 / 👎.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, port + ai completion suggestion glitch, fixed

manifest.parquet_parallel_encoding = json->getValue<bool>("parquet_parallel_encoding");
manifest.max_bytes_per_file = json->getValue<size_t>("max_bytes_per_file");
manifest.max_rows_per_file = json->getValue<size_t>("max_rows_per_file");
manifest.filename_pattern = json->getValue<String>("filename_pattern");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve backward compatibility for manifest deserialization

fromJsonString now unconditionally reads filename_pattern, but older metadata.json export manifests do not contain this key. On mixed-version or restarted clusters with pre-existing export tasks, json->getValue<String>("filename_pattern") will throw during task loading and prevent those exports from being processed. This field needs an optional read with a default (matching {part_name}_{checksum}) like other backward-compatible manifest fields.

Useful? React with 👍 / 👎.

ianton-ru
ianton-ru previously approved these changes Mar 11, 2026
ianton-ru
ianton-ru previously approved these changes Mar 16, 2026
Copy link

@ianton-ru ianton-ru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 2

@zvonand zvonand merged commit 120dd78 into antalya-25.8 Mar 17, 2026
96 of 97 checks passed
@Selfeer
Copy link
Collaborator

Selfeer commented Mar 18, 2026

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1512 (export part filename pattern setting):

Confirmed defects

  • Medium: Backward-incompatible export manifest deserialization
    • Impact: Nodes running this PR can fail to parse pre-existing export metadata created before this field existed, causing export-task polling/introspection/cancel flows to throw and skip processing.
    • Anchor: src/Storages/ExportReplicatedMergeTreePartitionManifest.h / ExportReplicatedMergeTreePartitionManifest::fromJsonString
    • Trigger: A metadata.json without filename_pattern is loaded after upgrade (e.g., pending/stale export entries from older builds).
    • Why defect: Deserialization unconditionally reads a newly introduced field, while multiple runtime paths still parse historical metadata; this violates backward compatibility for persisted task state.
    • Affected subsystem and blast radius: Replicated MergeTree export-partition control plane; impacts task reload/cleanup/status and some user-facing export management operations.
    • Smallest logical reproduction steps:
      1. Create an export entry with metadata that does not contain filename_pattern.
      2. Run server code from PR add setting to define filename pattern for part exports #1512.
      3. Trigger manifest loading path (background poll, export info query, or cancel path).
      4. Observe parse exception from missing key.
    • Fix direction (short): Make filename_pattern optional in fromJsonString and default it to {part_name}_{checksum} when absent.
    • Regression test direction (short): Add a unit/integration compatibility test that loads legacy metadata.json without filename_pattern and verifies task loading succeeds with default pattern.
    • Code evidence:
// src/Storages/ExportReplicatedMergeTreePartitionManifest.h
manifest.filename_pattern = json->getValue<String>("filename_pattern");
// src/Storages/MergeTree/ExportPartitionManifestUpdatingTask.cpp
// Background task parses metadata for every entry.
const auto metadata = ExportReplicatedMergeTreePartitionManifest::fromJsonString(metadata_json);
// src/Storages/StorageReplicatedMergeTree.cpp
// Additional user-facing/control paths parse the same metadata.
const auto metadata = ExportReplicatedMergeTreePartitionManifest::fromJsonString(metadata_json); // getPartitionExportsInfo
const auto manifest = ExportReplicatedMergeTreePartitionManifest::fromJsonString(metadata_json); // exportPartitionToTable existing-key path
const auto manifest = ExportReplicatedMergeTreePartitionManifest::fromJsonString(metadata_json); // killExportPartition fallback scan

Coverage summary

  • Scope reviewed: PR range 366fb38a9ed..ed45b160e19 for filename-pattern feature paths in settings, manifest persistence, scheduler/context propagation, export execution, and related tests/docs.
  • Categories failed: Backward-compatibility of persisted metadata schema; error-contract consistency for old-vs-new manifest parsing.
  • Categories passed: Setting plumbing/query-to-manifest propagation, source-table macro expansion path ({shard}/{replica}) in filename building, concurrency/locking changes (none introduced in feature path), exception-safety for export execution path.
  • Assumptions/limits: Static audit only (no runtime execution); conclusions are for merged PR code paths and realistic upgrade scenarios involving pre-existing export metadata.

@arthurpassos
Copy link
Collaborator Author

Not an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants