fix: prevent duplicate alias collision with user-provided __datafusion_extracted names#20432
fix: prevent duplicate alias collision with user-provided __datafusion_extracted names#20432adriangb merged 6 commits intoapache:mainfrom
Conversation
…n_extracted names (apache#20432) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n_extracted names (apache#20432) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| && let Some(id_str) = alias | ||
| .name | ||
| .strip_prefix(EXTRACTED_EXPR_PREFIX) | ||
| .and_then(|s| s.strip_prefix('_')) |
There was a problem hiding this comment.
Couldn't this be .and_then(|id| id_str.parse().ok())
…n_extracted names (apache#20430) When a user query contains an explicit alias using the reserved `__datafusion_extracted` prefix, the optimizer's AliasGenerator could generate the same alias name, causing a "Schema contains duplicate unqualified field name" error. Fix by scanning each plan node's expressions for pre-existing `__datafusion_extracted_N` aliases during the TopDown traversal in ExtractLeafExpressions, advancing the generator counter past them before any extraction occurs. Closes apache#20430 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Traverse into subqueries when extracting leaf expressions and when advancing the alias generator past existing extracted aliases. Also collapse nested if-let to satisfy clippy::collapsible_if. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2b26a30 to
c6774b6
Compare
| FROM t | ||
| WHERE COALESCE(get_field(s, 'f1'), get_field(s, 'f2')) = 1; | ||
| ---- | ||
| 1 |
There was a problem hiding this comment.
Would it make sense to also add an EXPLAIN assertion here? It could guard the alias allocation behavior directly (e.g. user-provided __datafusion_extracted_2 remains stable while optimizer-generated aliases move to the next IDs), so future optimizer refactors don’t regress silently.
assisted by codex
Adds an EXPLAIN query to verify the user-provided __datafusion_extracted_2 alias is preserved while optimizer-generated aliases skip to _3 and _4, guarding against silent regressions in alias allocation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
run benchmark sql_planner |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Looks like no major regressions, will go ahead and merge |
…n_extracted names (apache#20432) - Fixes a bug where the optimizer's `AliasGenerator` could produce alias names that collide with`__datafusion_extracted_N` aliases, causing a "Schema contains duplicate unqualified field name" error - I don't expect users themselves to create these aliases, but if you run the optimizers twice (with different `AliasGenerator` instances) you'll hit this. - Adds `AliasGenerator::update_min_id()` to advance the counter past existing aliases - Scans each plan node's expressions during `ExtractLeafExpressions` traversal to seed the generator before any extraction occurs - Switches to controlling the traversal which also means the config-based short circuit more clearly skips the entire rule. Closes apache#20430 - [x] Unit test: `test_user_provided_extracted_alias_no_collision` in `extract_leaf_expressions` - [x] SLT regression test in `projection_pushdown.slt` with explicit `__datafusion_extracted_2` alias 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
AliasGeneratorcould produce alias names that collide with__datafusion_extracted_Naliases, causing a "Schema contains duplicate unqualified field name" errorAliasGeneratorinstances) you'll hit this.AliasGenerator::update_min_id()to advance the counter past existing aliasesExtractLeafExpressionstraversal to seed the generator before any extraction occursCloses #20430
Test plan
test_user_provided_extracted_alias_no_collisioninextract_leaf_expressionsprojection_pushdown.sltwith explicit__datafusion_extracted_2alias🤖 Generated with Claude Code