perf: Optimize scalar fast path for regexp_like and rejects g inside combined flags like ig#20354
Conversation
| args: datafusion_expr::ScalarFunctionArgs, | ||
| ) -> Result<ColumnarValue> { | ||
| let args = &args.args; | ||
| match args.len() { |
There was a problem hiding this comment.
Is this check needed ?
Isn't this provided by the signature ?
| let pattern = pattern.unwrap(); | ||
| let result = match &args[0] { | ||
| ColumnarValue::Scalar(ScalarValue::Utf8(_)) => { | ||
| let array = StringArray::from(vec![value]); |
There was a problem hiding this comment.
Isn't the idea of the optimisation to not construct arrays for scalar values ?
IMO this should directly use regex::Regex
| if flags == Some("g") { | ||
| return plan_err!("regexp_like() does not support the \"global\" option"); | ||
| } |
There was a problem hiding this comment.
| if flags == Some("g") { | |
| return plan_err!("regexp_like() does not support the \"global\" option"); | |
| } | |
| if let Some(flagz) = flags && flagz.contains("g") { | |
| return plan_err!("regexp_like() does not support the \"global\" option"); | |
| } |
the third argument could be "ig", i.e. several flags, not just one.
There was a problem hiding this comment.
We should call this fix out in the PR title/body now, and preferably add a test for it
| if flags == Some("g") { | ||
| return plan_err!( | ||
| "regexp_like() does not support the \"global\" option" | ||
| ); | ||
| } |
There was a problem hiding this comment.
| if flags == Some("g") { | |
| return plan_err!( | |
| "regexp_like() does not support the \"global\" option" | |
| ); | |
| } | |
| if let Some(flagz) = flags && flagz.contains("g") { | |
| return plan_err!( | |
| "regexp_like() does not support the \"global\" option" | |
| ); | |
| } |
Jefffrey
left a comment
There was a problem hiding this comment.
Do we have sufficient test coverage for these new execution branches?
| if flags == Some("g") { | ||
| return plan_err!("regexp_like() does not support the \"global\" option"); | ||
| } |
There was a problem hiding this comment.
We should call this fix out in the PR title/body now, and preferably add a test for it
regexp_likeregexp_like and rejects g inside combined flags like ig
| } | ||
|
|
||
| fn regexp_like_scalar( | ||
| value: &ScalarValue, |
There was a problem hiding this comment.
It's inconsistent how this function accepts ScalarValues but the sibling function regexp_like_array_scalar directly accepts Option<&str>
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Thanks @kumarUjjawal, @martin-g & @Omega359 |
Which issue does this PR close?
Rationale for this change
regexp_likewas converting scalar inputs into single‑element arrays, adding avoidable overhead for constant folding and scalar‑only evaluations.What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
NO