Skip to content

feat: estimate cardinality for semi and anti-joins using distinct counts#20904

Open
buraksenn wants to merge 5 commits intoapache:mainfrom
buraksenn:use-ndv-for-semi-and-anti-join
Open

feat: estimate cardinality for semi and anti-joins using distinct counts#20904
buraksenn wants to merge 5 commits intoapache:mainfrom
buraksenn:use-ndv-for-semi-and-anti-join

Conversation

@buraksenn
Copy link
Contributor

Which issue does this PR close?

Does not close but part of #20766

Rationale for this change

Details are in #20766. But main idea is to use existing distinct count information to optimize joins similar to how Spark/Trino does

What changes are included in this PR?

This PR extends cardinality estimation for semi/anti joins using distinct counts

Are these changes tested?

I've added cases but not sure if I should've added benchmarks on this.

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Mar 12, 2026
Copy link
Member

@asolimando asolimando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a couple of minor points and a few tests to be added. The only change I'd like to see is bailing out when either side has no stats for a column pair.

@buraksenn buraksenn force-pushed the use-ndv-for-semi-and-anti-join branch from 79dcc2b to ee530c3 Compare March 12, 2026 20:44
buraksenn and others added 2 commits March 13, 2026 14:32
Co-authored-by: Alessandro Solimando <alessandro.solimando@gmail.com>
Copy link
Member

@asolimando asolimando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing all my comments fully!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants