Skip to content

Setting object_storage_remote_initiator#756

Merged
Enmk merged 12 commits intoantalyafrom
feature/object_storage_remote_initiator
May 14, 2025
Merged

Setting object_storage_remote_initiator#756
Enmk merged 12 commits intoantalyafrom
feature/object_storage_remote_initiator

Conversation

@ianton-ru
Copy link

@ianton-ru ianton-ru commented Apr 28, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Make remote call of object storage cluster function.

Documentation entry for user-facing changes

Execute query

SELECT * FROM s3Cluster('swarm', ....) SETTINGS object_storage_remote_initiator=true

as

SELECT * FROM remote('swarm_node', s3Cluster('swarm', ....))

where swarm_node is a random node from swarm cluster.

Requirements - swarm cluster must know about cluster with name swarm. In 'classic' old way only local initiator must know about swarm.

Also method getDataFiles returned (was removed as unused in ClickHouse#78775)

And small optimization - reusing sample_path in StorageObjectStorage (get once in StorageObjectStorageCluster), and getting sample_path from metadata in resolveSchemaAndFormat
Optimization removed because of strange side effects - inconsistent column type detection (LowCardinality instead of Nullable in some cases).

@svb-alt svb-alt added the antalya-25.2.2 Planned for 25.2.2 release label May 6, 2025
}
else
{
LOG_TEST(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a case where we request whole object?

Copy link
Author

@ianton-ru ianton-ru May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iceberg metadata for example

:) CREATE DATABASE datalake ENGINE = Iceberg('http://rest:8181/v1', 'minio', 'minio123') SETTINGS catalog_type = 'rest', storage_endpoint = 'http://minio:9000/warehouse', warehouse = 'iceberg'
:) SELECT * FROM datalake.`iceberg.bids`
Query id: d1bb9862-c077-403f-9843-94fd28173760

   ┌───────────────────datetime─┬─symbol─┬────bid─┬────ask─┐
1. │ 2019-08-09 08:35:00.000000 │ AAPL   │ 198.23 │ 195.45 │
2. │ 2019-08-09 08:35:00.000000 │ AAPL   │ 198.25 │  198.5 │
3. │ 2019-08-07 08:35:00.000000 │ AAPL   │ 195.23 │ 195.28 │
4. │ 2019-08-07 08:35:00.000000 │ AAPL   │ 195.22 │ 195.28 │
5. │ 2019-08-09 08:35:00.000000 │ AAPL   │ 198.23 │ 195.45 │
6. │ 2019-08-09 08:35:00.000000 │ AAPL   │ 198.25 │  198.5 │
   └────────────────────────────┴────────┴────────┴────────┘

:) select ProfileEvents['S3GetObject'] from system.query_log where type='QueryFinish' and query_id='d1bb9862-c077-403f-9843-94fd28173760'

   ┌─arrayElement⋯GetObject')─┐
1. │                        8 │
   └──────────────────────────┘
...

grep "Read S3 object" /var/log/clickhouse-server/clickhouse-server.log

2025.05.07 22:38:10.414791 [ 80 ] {d1bb9862-c077-403f-9843-94fd28173760} <Test> ReadBufferFromS3: Read S3 object. Bucket: warehouse, Key: data/metadata/00003-ad725ef4-c28e-4ed4-aa4b-2e2aae0716d4.metadata.json, Version: Latest
2025.05.07 22:38:10.416600 [ 80 ] {d1bb9862-c077-403f-9843-94fd28173760} <Test> ReadBufferFromS3: Read S3 object. Bucket: warehouse, Key: data/metadata/snap-182060351258856937-0-ff436521-29e9-4437-be5b-eb60f209baa9.avro, Version: Latest
2025.05.07 22:38:10.418360 [ 80 ] {d1bb9862-c077-403f-9843-94fd28173760} <Test> ReadBufferFromS3: Read S3 object. Bucket: warehouse, Key: data/metadata/ff436521-29e9-4437-be5b-eb60f209baa9-m0.avro, Version: Latest
2025.05.07 22:38:10.420138 [ 80 ] {d1bb9862-c077-403f-9843-94fd28173760} <Test> ReadBufferFromS3: Read S3 object. Bucket: warehouse, Key: data/metadata/6f3e6993-47c9-4556-b70c-c6c48d2ced6f-m0.avro, Version: Latest
2025.05.07 22:38:10.421658 [ 80 ] {d1bb9862-c077-403f-9843-94fd28173760} <Test> ReadBufferFromS3: Read S3 object. Bucket: warehouse, Key: data/metadata/f0de1c43-e367-4e3d-8c9d-4076d8fb0cbd-m0.avro, Version: Latest
2025.05.07 22:38:10.426911 [ 767 ] {d1bb9862-c077-403f-9843-94fd28173760} <Test> ReadBufferFromS3: Read S3 object. Bucket: warehouse, Key: data/data/datetime_day=2019-08-09/00000-0-ff436521-29e9-4437-be5b-eb60f209baa9.parquet, Version: Latest, Range: 0-1643
2025.05.07 22:38:10.427003 [ 762 ] {d1bb9862-c077-403f-9843-94fd28173760} <Test> ReadBufferFromS3: Read S3 object. Bucket: warehouse, Key: data/data/datetime_day=2019-08-09/00000-0-6f3e6993-47c9-4556-b70c-c6c48d2ced6f.parquet, Version: Latest, Range: 0-1643
2025.05.07 22:38:10.427050 [ 770 ] {d1bb9862-c077-403f-9843-94fd28173760} <Test> ReadBufferFromS3: Read S3 object. Bucket: warehouse, Key: data/data/datetime_day=2019-08-07/00000-0-f0de1c43-e367-4e3d-8c9d-4076d8fb0cbd.parquet, Version: Latest, Range: 0-1635

I added this for consistency, all requests count in ProfileEvents['S3GetObject'], but in logs only part of requests.

@svb-alt svb-alt removed the antalya-25.2.2 Planned for 25.2.2 release label May 12, 2025
@Enmk Enmk merged commit 1b8b4a9 into antalya May 14, 2025
327 of 346 checks passed
ianton-ru pushed a commit that referenced this pull request Jun 3, 2025
…nitiator

Setting object_storage_remote_initiator
ianton-ru pushed a commit that referenced this pull request Jun 3, 2025
…nitiator

Setting object_storage_remote_initiator
ianton-ru pushed a commit that referenced this pull request Jun 4, 2025
…nitiator

Setting object_storage_remote_initiator
Enmk added a commit that referenced this pull request Jun 4, 2025
…rage_remote_initiator

25.3 Antalya port of #756 - object storage cluster function
@svb-alt svb-alt added antalya-25.6 port-antalya PRs to be ported to all new Antalya releases and removed antalya-25.6 labels Jul 14, 2025
ianton-ru pushed a commit that referenced this pull request Sep 9, 2025
…nitiator

Setting object_storage_remote_initiator
Enmk added a commit that referenced this pull request Sep 9, 2025
ianton-ru pushed a commit that referenced this pull request Oct 13, 2025
ianton-ru pushed a commit that referenced this pull request Oct 13, 2025
Enmk added a commit that referenced this pull request Oct 14, 2025
@svb-alt svb-alt removed the port-antalya PRs to be ported to all new Antalya releases label Feb 6, 2026
ianton-ru pushed a commit that referenced this pull request Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants