InList: set/list value must be evaluated to get the values#2834
InList: set/list value must be evaluated to get the values#2834alamb merged 1 commit intoapache:masterfrom
Conversation
96c29b2 to
2fe8f01
Compare
Codecov Report
@@ Coverage Diff @@
## master #2834 +/- ##
==========================================
+ Coverage 85.19% 85.20% +0.01%
==========================================
Files 275 275
Lines 48761 48787 +26
==========================================
+ Hits 41541 41571 +30
+ Misses 7220 7216 -4
Continue to review full report at Codecov.
|
| //include `CastExpr + Literal` or `Literal` | ||
| //include `TryCastExpr/CastExpr + Literal` or `Literal` | ||
| fn check_all_static_filter_expr(list: &[Arc<dyn PhysicalExpr>]) -> bool { | ||
| // TODO optimize the checker for constant in datafusion |
There was a problem hiding this comment.
for datafusion, we need to add interface which is used to identify the expr is constant or literal or not.
If we have this interface, we can make more optimization.
But the interface should be better in the logical expr not in the physical expr like sprak fold
There was a problem hiding this comment.
I think ColumnarValue is one such interface
So if you call evaluate() on a Literal PhysicalExpr the returned value is ColumnarValue::Scalar:
There was a problem hiding this comment.
One way to make this code more general might be to change the check for all static expressions to after evaluate() is called.
Specifically, maybe the
if list.len() > OPTIMIZER_INSET_THRESHOLD && check_all_static_filter_expr(&list) {check could be refactored to call evaluate on all the elements of the list and then if all elements were ColumnarValue::Scalar and the list was sufficiently larger the set optimization could be invoked 🤔
alamb
left a comment
There was a problem hiding this comment.
I think this PR is an improvement. Thank you @liukun4515
I left a suggestion on how to maybe make the check more general (aka not have to special case different expression types).
| //include `CastExpr + Literal` or `Literal` | ||
| //include `TryCastExpr/CastExpr + Literal` or `Literal` | ||
| fn check_all_static_filter_expr(list: &[Arc<dyn PhysicalExpr>]) -> bool { | ||
| // TODO optimize the checker for constant in datafusion |
There was a problem hiding this comment.
I think ColumnarValue is one such interface
So if you call evaluate() on a Literal PhysicalExpr the returned value is ColumnarValue::Scalar:
| //include `CastExpr + Literal` or `Literal` | ||
| //include `TryCastExpr/CastExpr + Literal` or `Literal` | ||
| fn check_all_static_filter_expr(list: &[Arc<dyn PhysicalExpr>]) -> bool { | ||
| // TODO optimize the checker for constant in datafusion |
There was a problem hiding this comment.
One way to make this code more general might be to change the check for all static expressions to after evaluate() is called.
Specifically, maybe the
if list.len() > OPTIMIZER_INSET_THRESHOLD && check_all_static_filter_expr(&list) {check could be refactored to call evaluate on all the elements of the list and then if all elements were ColumnarValue::Scalar and the list was sufficiently larger the set optimization could be invoked 🤔
3238861 to
31ed154
Compare
Thanks for you suggestion about PTAL @alamb |
| } | ||
| //include `TryCastExpr/CastExpr + Literal` or `Literal` | ||
| fn check_all_static_filter_expr(list: &[Arc<dyn PhysicalExpr>], schema: &Schema) -> bool { | ||
| let batch = RecordBatch::new_empty(Arc::new(schema.to_owned())); |
There was a problem hiding this comment.
This is looking better -- but now evaluate() will be called twice on the inputs; Both here as well as in cast_static_filter_to_set . Since calling evaluate may be quite expensive, what do you think about calling it once and then passing the resulting Vec<ColumnarValue> to both check_all_static_filter_expr and cast_static_filter_to_set?
There was a problem hiding this comment.
I add a new function try_cast_static_filter_to_set to replace these two function in the InLIst.
31ed154 to
991ca98
Compare
|
Thanks for your review. @alamb |
alamb
left a comment
There was a problem hiding this comment.
Looks great to me @liukun4515 -- thank you for sticking with it. I think it is a good sign when there is less code but more functionality 👍
| set: None, | ||
| if list.len() > OPTIMIZER_INSET_THRESHOLD { | ||
| if let Ok(set) = try_cast_static_filter_to_set(&list, schema) { | ||
| return Self { |
Which issue does this PR close?
Closes #2820
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?