Skip to content

Rewrite physical expressions in execution plans#20009

Open
LLDay wants to merge 1 commit intoapache:mainfrom
LLDay:feat/physical-placeholders
Open

Rewrite physical expressions in execution plans#20009
LLDay wants to merge 1 commit intoapache:mainfrom
LLDay:feat/physical-placeholders

Conversation

@LLDay
Copy link

@LLDay LLDay commented Jan 26, 2026

Rationale for this change

This PR covers a part of #14342.

This PR introduces methods for replacing physical expressions in execution plan nodes. This is an intermediate step in the development of physical-level placeholders. Later, these methods will be used to resolve placeholders in physical plans.

What changes are included in this PR?

  • Expression Rewriting Interface: Added physical_expressions and with_physical_expressions to the ExecutionPlan trait to allow inspection and replacement of expressions within plan nodes.
  • FileSource Update: Added with_filter_and_projection to the FileSource trait. This method is required to implement with_physical_expressions for DataSourceExec. Since file-based scans store their physical expressions (pushed-down filters and projections) within the FileSource implementation, we need a way to reconstruct the FileSource with updated expressions during plan rewriting.
  • Window Function Rewriting: Added inner_expressions_iter and with_inner_expressions to the WindowExpr trait and its implementations to enable expression replacement within window functions. WindowAggExec and BoundedWindowAggExec now use these methods to support the ExecutionPlan rewriting interface.

Are these changes tested?

Are there any user-facing changes?

Yes, a user can rewrite physical expressions in some execution plans.

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) execution Related to the execution crate proto Related to proto crate datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Jan 26, 2026
@LLDay LLDay marked this pull request as ready for review January 26, 2026 13:18
@LLDay
Copy link
Author

LLDay commented Jan 26, 2026

@askalt, could you review the PR as well?

@askalt
Copy link
Contributor

askalt commented Jan 26, 2026

cc @Omega359 (I remember you were also interested in placeholders within physical plans).

@Omega359
Copy link
Contributor

cc @Omega359 (I remember you were also interested in placeholders within physical plans).

Yes, I would like to be able to reuse plans - I'll try and find time in the next day or so to review this though I'm not particularly familiar with this area of code.

@LLDay LLDay force-pushed the feat/physical-placeholders branch from 7d86e73 to eb09e71 Compare January 27, 2026 07:05
@LLDay LLDay force-pushed the feat/physical-placeholders branch from eb09e71 to 3046ed0 Compare January 27, 2026 14:03
@github-actions github-actions bot removed optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) execution Related to the execution crate proto Related to proto crate labels Jan 27, 2026
@LLDay LLDay changed the title Physical-level placeholders Rewrite physical expressions in execution plans Jan 27, 2026
@LLDay LLDay force-pushed the feat/physical-placeholders branch from 3046ed0 to 4420b7f Compare January 27, 2026 14:45
@LLDay
Copy link
Author

LLDay commented Jan 27, 2026

We decided to leave only one commit in this PR regarding the rewriting of physical expressions. The second commit with ResolvePlaceholdersExec will be in a separate PR. You can find what the physical expressions resolution looked like here, but this code will be changed.

@askalt
Copy link
Contributor

askalt commented Jan 28, 2026

By some reasons, I cannot resolve threads in this PR -- if you can, let's resolve all threads from the previous review iteration.
image

@LLDay LLDay force-pushed the feat/physical-placeholders branch 2 times, most recently from e4dd613 to 2186826 Compare January 29, 2026 05:50
@askalt
Copy link
Contributor

askalt commented Feb 2, 2026

Let's resolve threads with a thumb emoji. Look almost good to me, just several details are remained.

@askalt
Copy link
Contributor

askalt commented Feb 2, 2026

@Jefffrey could you please approve CI workflow?

@LLDay LLDay force-pushed the feat/physical-placeholders branch from 2186826 to dcc0262 Compare February 3, 2026 10:07
Copy link
Contributor

@askalt askalt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! It looks good to me, will try to find maintainer to review it.

})
}

fn with_filter_and_projection(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not mentioned at all in the PR description. Can you please explain why this is needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review! I have updated the PR description. This method allows to override the physical expressions within a file source. For example, we will be able to replace placeholder physical expressions. You can see it here (I removed ResolvePlaceholdersExec from the PR, but it worked by finding the placeholders and then replacing them with literals).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the call chain? Is it an optimizer rule that calls this?

Copy link
Author

@LLDay LLDay Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataSourceExec::with_physical_expressions -> FileScanConfig::with_physical_expressions -> FileSource::with_filter_and_projection.

This is just one of the implementations of ExecutionPlan::with_physical_expressions, we just need additional methods that rewrite expressions in internal entities.

fn physical_expressions<'a>(
&'a self,
) -> Option<Box<dyn Iterator<Item = Arc<dyn PhysicalExpr>> + 'a>> {
None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to in some future version require every ExecutionPlan to implement this even if it's a no op. Otherwise it's easy to forget. I think enough ExecutionPlans would have expressions that it's worth forcing implementers to make a decision.

Copy link
Author

@LLDay LLDay Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted these changes to be non-breaking. At least, until we implemented a reliable mechanism using these methods (such as ResolvePlaceholdersExec or something else).

fn physical_expressions<'a>(
&'a self,
) -> Option<Box<dyn Iterator<Item = Arc<dyn PhysicalExpr>> + 'a>> {
None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note about removing default impl in the future.

///
/// The default implementation returns `None`, indicating that the node either has no physical
/// expressions or does not support exposing them.
fn physical_expressions<'a>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @gabotechs we were discussing adding just this the other day

Comment on lines 81 to 88
/// Returns new file source with given filter and projection.
fn with_filter_and_projection(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do add this I think it's worth making it very clear how this is intended to be used.

For example, users should not call this from their custom TableProvider implementation to pass in projections / filters. That pushdown is handled by optimizer rules. This is only to be used by ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the documentation of the method.

@LLDay LLDay force-pushed the feat/physical-placeholders branch from 966c698 to f92e2cd Compare February 5, 2026 13:50
Comment on lines +764 to +769
fn with_physical_expressions(
&self,
_params: ReplacePhysicalExpr,
) -> Result<Option<Arc<dyn ExecutionPlan>>> {
Ok(None)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this API is enough to be used in all the current ExecutionPlan implementations.

Take AggregateExec for example:

It has:

  • a Vec of AggregateFunctionExpr where each entry contains args: Vec<Arc<dyn PhysicalExpr>>
  • the same Vec of AggregateFunctionExpr also contains Vec<PhysicalSortExpr> (with one expression each)
  • a filter_expr: Vec<Option<Arc<dyn PhysicalExpr>>>
  • a dynamic_filter

If AggregateExecb::physical_expressions returned something like vec![expr1, expr2, expr3, expr4], how would you know to which of all those components above each expression belongs? are the 4 expressions AggregateFunctionExprs? are the first two AggregateFunctionExprs, the third one a filter_expr and the last one a dynamic filter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we need to be sure this is the right API before moving forward with it

how would you know to which of all those components above each expression belongs?

What would the use cases be that need to differentiate this? Would they be aggregate specific -> could they downcast the plan and call functions on AggregateExec directly?

If something needs to (1) operate on all expressions and (2) have specific logic depending on where that expression is that seems like a hard API to get right.

Copy link
Contributor

@askalt askalt Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If AggregateExecb::physical_expressions returned something like vec![expr1, expr2, expr3, expr4], how would you know to which of all those components above each expression belongs? are the 4 expressions AggregateFunctionExprs

A layout of the vector returned from ExecutionPlan::physical_expr(...) is plan specific. The API does not provide an ability to modify the layout itself: it could not be done generically, as each plan has different sorts of expressions, so if specific one should be modified -- the right way is to downcast the plan (caller knows which expression exactly is desired to be modified).

On other hand, this API provides an ability to switch some parts of the expressions prior the plan will be executed. For example, we are going to implement placeholders in the plans based on this API, as we can write the code like:

fn resolve_placeholders(expr: &Arc<dyn PhysicalExpr) -> Arc<dyn PhysicalExpr> {
   ...
}

let resolved_expr = plan.physical_expr().iter().map(|p| resolve_placeholders(p)); 
let plan = plan.with_physical_expr(resolved_expr);
plan.execute(...)

Copy link
Author

@LLDay LLDay Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API does not provide an ability to modify the layout itself:

Yes, we require that the number of expressions returned by ExecutionPlan::physical_expressions match the number of expressions that we specify in ExecutionPlan::with_physical_expressions. If we specified the wrong number of expressions, the call returns an error. This invariant is enforced here (1, 2).

When we have exactly the same number of expressions, we simply replace the corresponding expressions with new ones.

Introduces `physical_expressions` and `with_physical_expressions`
methods to the `ExecutionPlan` trait. These methods provide an interface
to inspect and replace physical expressions within execution plan nodes.

The commit also implements these methods for various ExecutionPlans and
introduces a new physical expression invariant check to ensure
consistency between expression retrieval and replacement.
@LLDay LLDay force-pushed the feat/physical-placeholders branch from f92e2cd to f7d9a07 Compare February 5, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants