Clean up InList code to use functions rather than macros#2826
Clean up InList code to use functions rather than macros#2826alamb wants to merge 1 commit intoapache:masterfrom
Conversation
| /// | ||
| /// # Example | ||
| /// TODO | ||
| fn unary_cmp<I, F>(array: &PrimitiveArray<I>, op: F) -> BooleanArray |
There was a problem hiding this comment.
This is copy/paste from https://docs.rs/arrow/17.0.0/arrow/compute/kernels/arity/index.html
alamb
left a comment
There was a problem hiding this comment.
@Ted-Jiang are there any performance benchmarks that can be run for the InList implementation?
| let null_bit_buffer = $left.data().null_buffer().cloned(); | ||
|
|
||
| let comparison = | ||
| (0..$left.len()).map(|i| unsafe { $op($left.value_unchecked(i), $right) }); |
There was a problem hiding this comment.
I think in general using iterators (rather than value_unchecked()) is faster
| /// Applies an unary and infallible comparison function to a primitive | ||
| /// array. This is the fastest way to perform an operation on a |
There was a problem hiding this comment.
| /// Applies an unary and infallible comparison function to a primitive | |
| /// array. This is the fastest way to perform an operation on a | |
| /// TODO: move to arrow-rs: https://github.com/apache/arrow-rs/issues/1987 | |
| /// | |
| /// Applies an unary and infallible comparison function to a primitive | |
| /// array. This is the fastest way to perform an operation on a |
| fn in_list_primitive<T: ArrowPrimitiveType>( | ||
| array: &PrimitiveArray<T>, | ||
| values: &[<T as ArrowPrimitiveType>::Native], | ||
| ) -> Result<BooleanArray> { |
There was a problem hiding this comment.
made infallible as it can't fail
| ) -> BooleanArray { | ||
| array | ||
| .iter() | ||
| .map(|x| x.map(|x| !values.contains(&x))) |
There was a problem hiding this comment.
This change now does check each element if it is Null before invoking the closure where the compare_op_scalar! macro does not. However, given that that the closure will do some non trivial string checks, I think first checking for null will likely not slow it down
|
Hmm, something is not correct. Will investigate |
|
I don't have time / plans to work on this any more |
Which issue does this PR close?
N/A (at the moment)
Rationale for this change
Inspired by @liukun4515 's PR here: #2809 I saw some macro use that could be reduced and may even be faster
What changes are included in this PR?
Are there any user-facing changes?
No