Skip to content

InList: set/list value must be evaluated to get the values#2834

Merged
alamb merged 1 commit intoapache:masterfrom
liukun4515:inlist_set_need_evel_#2820
Jul 7, 2022
Merged

InList: set/list value must be evaluated to get the values#2834
alamb merged 1 commit intoapache:masterfrom
liukun4515:inlist_set_need_evel_#2820

Conversation

@liukun4515
Copy link
Contributor

Which issue does this PR close?

Closes #2820

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jul 4, 2022
@liukun4515 liukun4515 force-pushed the inlist_set_need_evel_#2820 branch from 96c29b2 to 2fe8f01 Compare July 5, 2022 05:52
@github-actions github-actions bot added the core Core DataFusion crate label Jul 5, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jul 5, 2022

Codecov Report

Merging #2834 (9c403dd) into master (bf7564f) will increase coverage by 0.01%.
The diff coverage is 97.61%.

❗ Current head 9c403dd differs from pull request most recent head 31ed154. Consider uploading reports for the commit 31ed154 to get more accurate results

@@            Coverage Diff             @@
##           master    #2834      +/-   ##
==========================================
+ Coverage   85.19%   85.20%   +0.01%     
==========================================
  Files         275      275              
  Lines       48761    48787      +26     
==========================================
+ Hits        41541    41571      +30     
+ Misses       7220     7216       -4     
Impacted Files Coverage Δ
...atafusion/physical-expr/src/expressions/in_list.rs 82.86% <97.29%> (+1.03%) ⬆️
datafusion/core/src/physical_plan/planner.rs 80.98% <100.00%> (ø)
...atafusion/physical-expr/src/expressions/literal.rs 100.00% <100.00%> (ø)
datafusion/physical-expr/src/planner.rs 92.36% <100.00%> (ø)
datafusion/expr/src/logical_plan/plan.rs 74.50% <0.00%> (+0.19%) ⬆️
datafusion/physical-expr/src/expressions/cast.rs 98.46% <0.00%> (+0.38%) ⬆️
datafusion/core/src/physical_plan/metrics/value.rs 87.43% <0.00%> (+0.50%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf7564f...31ed154. Read the comment docs.

@liukun4515 liukun4515 requested review from alamb and viirya July 5, 2022 07:03
//include `CastExpr + Literal` or `Literal`
//include `TryCastExpr/CastExpr + Literal` or `Literal`
fn check_all_static_filter_expr(list: &[Arc<dyn PhysicalExpr>]) -> bool {
// TODO optimize the checker for constant in datafusion
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for datafusion, we need to add interface which is used to identify the expr is constant or literal or not.
If we have this interface, we can make more optimization.
But the interface should be better in the logical expr not in the physical expr like sprak fold

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to make this code more general might be to change the check for all static expressions to after evaluate() is called.

Specifically, maybe the

        if list.len() > OPTIMIZER_INSET_THRESHOLD && check_all_static_filter_expr(&list) {

check could be refactored to call evaluate on all the elements of the list and then if all elements were ColumnarValue::Scalar and the list was sufficiently larger the set optimization could be invoked 🤔

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is an improvement. Thank you @liukun4515

I left a suggestion on how to maybe make the check more general (aka not have to special case different expression types).

//include `CastExpr + Literal` or `Literal`
//include `TryCastExpr/CastExpr + Literal` or `Literal`
fn check_all_static_filter_expr(list: &[Arc<dyn PhysicalExpr>]) -> bool {
// TODO optimize the checker for constant in datafusion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//include `CastExpr + Literal` or `Literal`
//include `TryCastExpr/CastExpr + Literal` or `Literal`
fn check_all_static_filter_expr(list: &[Arc<dyn PhysicalExpr>]) -> bool {
// TODO optimize the checker for constant in datafusion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to make this code more general might be to change the check for all static expressions to after evaluate() is called.

Specifically, maybe the

        if list.len() > OPTIMIZER_INSET_THRESHOLD && check_all_static_filter_expr(&list) {

check could be refactored to call evaluate on all the elements of the list and then if all elements were ColumnarValue::Scalar and the list was sufficiently larger the set optimization could be invoked 🤔

@liukun4515 liukun4515 force-pushed the inlist_set_need_evel_#2820 branch 2 times, most recently from 3238861 to 31ed154 Compare July 6, 2022 03:11
@liukun4515
Copy link
Contributor Author

I think this PR is an improvement. Thank you @liukun4515

I left a suggestion on how to maybe make the check more general (aka not have to special case different expression types).

Thanks for you suggestion about evel const value.
I have changed the code according to your suggestion.

let batch = RecordBatch::new_empty(Arc::new(schema.to_owned()));
    list.iter().all(|v| match v.evaluate(&batch) {
        Err(_) | Ok(ColumnarValue::Array(_)) => false,
        Ok(ColumnarValue::Scalar(_)) => true,
    })

PTAL @alamb

}
//include `TryCastExpr/CastExpr + Literal` or `Literal`
fn check_all_static_filter_expr(list: &[Arc<dyn PhysicalExpr>], schema: &Schema) -> bool {
let batch = RecordBatch::new_empty(Arc::new(schema.to_owned()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking better -- but now evaluate() will be called twice on the inputs; Both here as well as in cast_static_filter_to_set . Since calling evaluate may be quite expensive, what do you think about calling it once and then passing the resulting Vec<ColumnarValue> to both check_all_static_filter_expr and cast_static_filter_to_set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a new function try_cast_static_filter_to_set to replace these two function in the InLIst.

@liukun4515 liukun4515 force-pushed the inlist_set_need_evel_#2820 branch from 31ed154 to 991ca98 Compare July 7, 2022 02:20
@liukun4515
Copy link
Contributor Author

Thanks for your review. @alamb
I think it's ready to be merged.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me @liukun4515 -- thank you for sticking with it. I think it is a good sign when there is less code but more functionality 👍

set: None,
if list.len() > OPTIMIZER_INSET_THRESHOLD {
if let Ok(set) = try_cast_static_filter_to_set(&list, schema) {
return Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 0ce6f1b into apache:master Jul 7, 2022
@liukun4515 liukun4515 deleted the inlist_set_need_evel_#2820 branch July 7, 2022 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Not evaluate the set expr in the InList for the optimization

3 participants