Introduce user-defined signature#10439
Conversation
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
| data_types_with_scalar_udf(&arg_data_types, func).map_err(|err| { | ||
| plan_datafusion_err!( | ||
| "{}", | ||
| "{} and {}", |
There was a problem hiding this comment.
keep error for debugging
| }) | ||
| .call(vec![lit("Apple")]); | ||
| let plan_err = Projection::try_new(vec![udf], empty) | ||
| Projection::try_new(vec![udf], empty) |
There was a problem hiding this comment.
I think matching err message is unnecessary
There was a problem hiding this comment.
As long as the message is tested elsewhere I think this is fine
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
alamb
left a comment
There was a problem hiding this comment.
Thank you @jayzhan211 -- this looks great to me
Prior to merge, I think it is worth considering (but not required):
- Using a single new variant for
TypeSignature(rather than 2) - Improving the doc strings (I left some suggestions)
- Moving the semantic changes to
coalsceinto their own PR
datafusion/expr/src/signature.rs
Outdated
| /// | ||
| /// `make_array(i32, i64) -> make_array(i64, i64)` | ||
| VariadicEqual, | ||
| /// One or more arguments of an arbitrary type and coerced with user-defined coercion rules. |
There was a problem hiding this comment.
Since both TypeSignature::VariadicCoercion and TypeCoercion::UniformCoercion call into ScalarUDFImpl::coerce_types I wonder if there is a reason to have both.
Maybe instead we could add a single variant for both cases. Perhaps something like:
pub enum TypeSignature {
...
/// The acceptable signature and coercions rules to coerce arguments to this
/// signature are special for this function. If this signature is specified,
/// Datafusion will call [`ScalarUDFImpl::coerce_types`] to prepare argument types.
FunctionDefined,
...
}| } | ||
| TypeSignature::VariadicCoercion | TypeSignature::UniformCoercion(_) => { | ||
| return internal_err!( | ||
| "Coercion signature is handled in function-specific get_valid_types." |
There was a problem hiding this comment.
| "Coercion signature is handled in function-specific get_valid_types." | |
| "Coercion signature should be handled by in function-specific coerce_types." |
| &self.aliases | ||
| } | ||
|
|
||
| fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> { |
| } | ||
|
|
||
| fn coerce_types(&self, arg_types: &[DataType]) -> Result<Vec<DataType>> { | ||
| let new_type = arg_types.iter().skip(1).try_fold( |
There was a problem hiding this comment.
this pattern to coerce all arguments to the type of the first one is repeated several times -- maybe we could make some sort of utility function to do it
As a follow on PR perhaps
| }) | ||
| .call(vec![lit("Apple")]); | ||
| let plan_err = Projection::try_new(vec![udf], empty) | ||
| Projection::try_new(vec![udf], empty) |
There was a problem hiding this comment.
As long as the message is tested elsewhere I think this is fine
| .unwrap(); | ||
| assert_eq!( | ||
| "type_coercion\ncaused by\nError during planning: Coercion from [Utf8] to the signature Uniform(1, [Float64]) failed.", | ||
| "type_coercion\ncaused by\nError during planning: [data_types_with_aggregate_udf] Coercion from [Utf8] to the signature Uniform(1, [Float64]) failed.", |
There was a problem hiding this comment.
Maybe we could make this error say "User defined coercsion rule failed" so it was clearer what happened
|
|
||
| # test with first null | ||
| query IT | ||
| query ?T |
There was a problem hiding this comment.
I would still prefer to make the change to the coalsce coercion rules as its own PR, but I am also fine if we include it in this PR
There was a problem hiding this comment.
We need to keep the old signature VariadicEqual for coalesce to split the change for coalesce :(. I prefer to make the change at once.
datafusion/expr/src/udf.rs
Outdated
| false | ||
| } | ||
|
|
||
| /// Coerce the types of the arguments to the types that the function expects |
There was a problem hiding this comment.
Nice -- I think it would be nice to expand this documentation to help people understand the API some more
Perhaps something like
| /// Coerce the types of the arguments to the types that the function expects | |
| /// Coerce arguments of a function call to types that the function can evaluate. | |
| /// | |
| /// This function is only called if [`Self::signature`] returns [`TypeSignature::UserDefined`]. Most | |
| /// UDFs should return one of the other variants of `TypeSignature` which handle common | |
| /// cases | |
| /// | |
| /// See the [type coercion module](datafusion_expr::type_coercion) | |
| /// documentation for more details on type coercion | |
| /// | |
| /// For example, if your function requires a floating point arguments, but the user calls | |
| /// it like `my_func(1::int)` (aka with `1` as an integer), coerce_types could return `[DataType::Float64]` | |
| /// to ensure the argument was cast to `1::double` | |
| /// | |
| /// # Parameters | |
| /// * `arg_types`: The argument types of the arguments this function with | |
| /// | |
| /// # Return value | |
| /// A Vec the same length as `arg_types`. DataFusion will `CAST` the function call | |
| /// arguments to these specific types. |
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
alamb
left a comment
There was a problem hiding this comment.
Looks great to me -- thanks @jayzhan211
I had one question on a comment in the docstrings but we could fix that as a follow on as well
| /// For more details on coercion in general, please see the | ||
| /// [`type_coercion`](crate::type_coercion) module. | ||
| /// | ||
| /// This function will be replaced with [data_types_with_scalar_udf], |
There was a problem hiding this comment.
I am not sure we want to replace this function over time -- I think having the basic simple Signatures that handle most common coercions makes sense to have in DataFusion core (even if it could be done purely in a udf) as it will make creating UDFs easier for users
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
VariadicCoercion and UniformCoercion|
🎉 |
* introduce new sig Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add udfimpl Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * replace fun Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * replace array Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * coalesce Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * nvl2 Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm variadic equal Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix test Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * rm err msg to fix ci Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * user defined sig Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * add err msg Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fmt Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * cleanup Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix ci Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * fix ci Signed-off-by: jayzhan211 <jayzhan211@gmail.com> * upd comment Signed-off-by: jayzhan211 <jayzhan211@gmail.com> --------- Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Which issue does this PR close?
Closes #10423 .
Rationale for this change
What changes are included in this PR?
Add
VariadicCoercion,UniformCoercionRemove
VariadicEqual(coercion rule does not require the result type to be the same for all)Add the datafusion error aside from the signature error message for better debugging usage that is why many test files are updated.
Are these changes tested?
UDAF code is also included but no test is covered since no UDAF requires coercion there yet (We will cover it after moving builtin to UDAF)
Are there any user-facing changes?