revert some code for #4726 / remove unnecessary coercion in physical plans#4741
Conversation
| }, | ||
| (Timestamp(_, tz), Utf8) => Some(Timestamp(TimeUnit::Nanosecond, tz.clone())), | ||
| (Utf8, Timestamp(_, tz)) => Some(Timestamp(TimeUnit::Nanosecond, tz.clone())), | ||
| // TODO: need to investigate the result type for the comparison between timestamp and date |
There was a problem hiding this comment.
#4644 (comment)
we need to get the behavior for the comparison in the PG.
There was a problem hiding this comment.
@liukun4515 comparison is usually boolean type, did you mean what is the common datatype for casting between timestamp and date in diff scenarios?
There was a problem hiding this comment.
Yes
the result of comparison is bool type, but if left and right are not the same data type, we should get the common type for left and right through the type coercion.
For example, INT32>INT64, the common type is INT64, and we should cast the left to INT64.
There was a problem hiding this comment.
The type of type coercion is decided by input types and op.
There was a problem hiding this comment.
@liukun4515 comparison is usually boolean type, did you mean what is the common datatype for casting between timestamp and date in diff scenarios?
I think it's not about casting from one type to other type, it is related the type inference for different input types and different operation or functions.
There was a problem hiding this comment.
I try the comparison timestamp with date in the Spark, and find the coerced type is timestamp.
spark-sql> explain extended select now()>cast('2000-01-01' as date);
== Parsed Logical Plan ==
'Project [unresolvedalias(('now() > cast(2000-01-01 as date)), None)]
+- OneRowRelation
== Analyzed Logical Plan ==
(now() > CAST(2000-01-01 AS DATE)): boolean
Project [(now() > cast(cast(2000-01-01 as date) as timestamp)) AS (now() > CAST(2000-01-01 AS DATE))#49]
+- OneRowRelation
== Optimized Logical Plan ==
Project [true AS (now() > CAST(2000-01-01 AS DATE))#49]
+- OneRowRelation
== Physical Plan ==
*(1) Project [true AS (now() > CAST(2000-01-01 AS DATE))#49]
+- *(1) Scan OneRowRelation[]
But current implementation of coerced type is date, we can fix this in the follow up PR.
There was a problem hiding this comment.
Timestamp is probably consistent with other coercions that ensure no data is truncated (so going Date --> Timestamp is good as the the Timestamp could be losslessly converted back to a Date). However, going from Timestamp --> Date would truncate the precision of the original timesteamp,
| Timestamp(TimeUnit::Nanosecond, None) => { | ||
| matches!(type_from, Null | Timestamp(_, None)) | ||
| } | ||
| Date32 => { |
There was a problem hiding this comment.
why this change?
Though I admit that the fact that all tests still pass is concerning 🤔
There was a problem hiding this comment.
This was added from the commit https://github.com/comphead/arrow-datafusion/blob/4761802a4bc46d377406aa20767c4af66b80d65b/datafusion/expr/src/type_coercion/functions.rs#L180 in the #4726
I just revert useless codes adding from the PR #4726
| /// Returns a common coerced datatype between 2 given datatypes | ||
| /// | ||
| /// See the module level documentation for more detail on coercion. | ||
| pub fn get_common_coerced_type( |
There was a problem hiding this comment.
I think this method can be of some interest, if the user wants to check the common coerced type between 2 datatypes?
There was a problem hiding this comment.
It may be worth waiting for a specific use before leaving it in 🤔
There was a problem hiding this comment.
I think this method can be of some interest, if the user wants to check the common coerced type between 2 datatypes?
the common coerced type will be decided by input exprs/input types and op, if the op is different and the same input types/input exprs may get the different coerced type.
There was a problem hiding this comment.
Let's merge this PR then -- we can always recover get_common_coerced_type from this PR or git history if we need it.
|
Benchmark runs are scheduled for baseline = d3ca9b0 and contender = 760f108. 760f108 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
from the #4726 (comment)
If we want to compare the timestamp with date, we just need to add logic in
type_coercion, and don't need to do type coercion in the physical phase.Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?