refactor: separate get_result_type from coerce_type#6221
refactor: separate get_result_type from coerce_type#6221jackwener merged 4 commits intoapache:mainfrom
coerce_type#6221Conversation
| let message = format!( | ||
| "Expression {case:?} expected to error due to impossible coercion" | ||
| ); | ||
| assert!(logical_plan.is_err(), "{}", message); |
There was a problem hiding this comment.
Type checking is done in the Type coercion analyzer rule rather than directly in the build
coerce_type
|
Next, we should separate |
coerce_typecoerce_type
| | Operator::NotEq | ||
| | Operator::Lt | ||
| | Operator::Gt | ||
| | Operator::GtEq | ||
| | Operator::LtEq | ||
| | Operator::IsDistinctFrom |
There was a problem hiding this comment.
Original, when we call get_data_type, it will return common type instead of result type (Boolean) in coerce_type. So original it's wrong.
alamb
left a comment
There was a problem hiding this comment.
I think this is a great step forward -- thank you @jackwener
I think this also closes #3419
Perhaps we can update the comments here https://github.com/apache/arrow-datafusion/blob/4297547df6dc297d692ca82566cfdf135d4730b5/datafusion/expr/src/type_coercion/binary.rs#L111-L121 and update it to say that 'coerce_types' is returns the type the arguments should be coerced to
I am happy to do so in a follow on PR
Thank you for taking this first step to clean up the coercion logic ❤️
| || is_interval(lhs_type) | ||
| || is_interval(rhs_type) => | ||
| { | ||
| temporal_add_sub_coercion(lhs_type, rhs_type, op) |
There was a problem hiding this comment.
Eventually I think it would make the code easier to understand and maintain if we can disentangle coerce_types from get_result_types. To start in this PR, calling into coerce_types is fine.
Eventually I think it would keep the logic much simpler if get_result_type is only called on expressions that have their inputs properly coerced.
| @@ -77,7 +77,7 @@ impl PhysicalExpr for DateTimeIntervalExpr { | |||
| } | |||
|
|
|||
| fn data_type(&self, input_schema: &Schema) -> Result<DataType> { | |||
There was a problem hiding this comment.
I would also love to remove DateTimeIntervalExpr out of its own thing and into binary.rs with the other operators, but for now 👍
| @@ -42,7 +42,7 @@ pub fn binary_operator_data_type( | |||
| let result_type = if !any_decimal(lhs_type, rhs_type) { | |||
There was a problem hiding this comment.
I als find binary_oprator_data_type to be confusing and redundant with get_result_type but we can fix that in a follow on PR perhaps
There was a problem hiding this comment.
Great suggestion, I will do in following PR
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
|
Thanks @alamb review carefully. |
|
I prepare to merge this PR because there are some following PR. Anyone who wants to review this PR can continue, and they they can also review these content in the following PR |
|
One slightly unfortunate quirk of this separation is it isn't clear if the inputs to get_result_type are before or after type coercion. This has non-trivial implications for decimal arithmetic, in particular for division, as type coercion is not idempotent. I'm still thinking about how to handle this better, but thought I would mention it in case you've already given this some thought |
Agree with it. It may be related with #6261 |
In my opinion, Basically I think once the "Analyzer" pass has run on the expr tree the rest of the code shouldn't have to guess / coerce types anymore -- the types would always line up or the query would be rejected. I think this is the standard approach in other databases (and, eg. the type checking / semantic pass in compilers) |
Which issue does this PR close?
related to #5927
Rationale for this change
coerce_typecurrent mix many motivation in it.I prepare to split it into
get_result_typeandget_common_typeabout check type convertion #6223
What changes are included in this PR?
separate get_result_type from coerce_type.
Are these changes tested?
Are there any user-facing changes?