Skip to content

Consolidate ClickBench setup documentation for DataFusion#20016

Open
kosiew wants to merge 1 commit intoapache:mainfrom
kosiew:clickbench-documentation-20007
Open

Consolidate ClickBench setup documentation for DataFusion#20016
kosiew wants to merge 1 commit intoapache:mainfrom
kosiew:clickbench-documentation-20007

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Jan 27, 2026

Which issue does this PR close?

Rationale for this change

ClickBench setup requirements for DataFusion were scattered across multiple places (benchmark code constants, sqllogictest files, and brief README notes). This made it easy for users to miss critical configuration steps—especially binary_as_string for binary columns and the EventDate UInt16 → DATE transformation—leading to confusing failures or incorrect results.

This PR, a follow up PR for #19881 , consolidates the setup knowledge into a single, copy‑pasteable section in benchmarks/README.md, and adds cross-references from the code/test locations back to that canonical documentation.

What changes are included in this PR?

  • Added a new “Running ClickBench on DataFusion” section to benchmarks/README.md that documents:

    • Why and when to enable binary_as_string when registering the ClickBench Parquet file.
    • Why EventDate must be transformed from UInt16 (days since epoch) to a SQL DATE, including a clear explanation of the failure mode when not transformed.
    • A canonical, end-to-end setup example (external table + view + sample query).
    • How to run the benchmark via ./bench.sh.
  • Added a pointer comment in benchmarks/src/clickbench.rs (near the HITS_VIEW_DDL / view DDL) directing readers to the README section as the source of truth.

  • Added a pointer comment in datafusion/sqllogictest/test_files/clickbench.slt directing readers to the README section for full setup details.

Are these changes tested?

  • Yes (documentation-aligned coverage):

    • The clickbench.slt file continues to create the hits view using the documented EventDate casting pattern.

    • The ClickBench benchmark runner in benchmarks/src/clickbench.rs continues to apply the same view DDL and can be exercised via:

      • ./bench.sh data clickbench
      • ./bench.sh run clickbench

No new automated tests were added because the change is primarily documentation plus comments, and behavior is already validated through existing benchmark and sqllogictest workflows.

Are there any user-facing changes?

  • Yes: improved user-facing documentation.

    • benchmarks/README.md now includes a consolidated, canonical ClickBench-on-DataFusion setup guide with rationale and a complete example.
    • No API changes.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 27, 2026
@kosiew kosiew marked this pull request as ready for review January 27, 2026 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consolidate ClickBench Setup Documentation in benchmarks/README.md

1 participant