Skip to content

perf: Optimize scalar fast path for regexp_like and rejects g inside combined flags like ig#20354

Merged
Jefffrey merged 6 commits intoapache:mainfrom
kumarUjjawal:perf/rlike_scalar_path
Feb 22, 2026
Merged

perf: Optimize scalar fast path for regexp_like and rejects g inside combined flags like ig#20354
Jefffrey merged 6 commits intoapache:mainfrom
kumarUjjawal:perf/rlike_scalar_path

Conversation

@kumarUjjawal
Copy link
Contributor

@kumarUjjawal kumarUjjawal commented Feb 14, 2026

Which issue does this PR close?

Rationale for this change

regexp_like was converting scalar inputs into single‑element arrays, adding avoidable overhead for constant folding and scalar‑only evaluations.

What changes are included in this PR?

  • Add a scalar fast path in RegexpLikeFunc::invoke_with_args that evaluates regexp_like directly for scalar inputs
  • Add benchmark
  • Fixes regexp_like to reject the global flag even when provided in combined flags (e.g., ig) across scalar and array+scalar execution paths; adds tests for both branches.
Type Before After Speedup
regexp_like_scalar_utf8 12.092 µs 10.943 µs 1.10x

Are these changes tested?

Yes

Are there any user-facing changes?

NO

@github-actions github-actions bot added the functions Changes to functions implementation label Feb 14, 2026
args: datafusion_expr::ScalarFunctionArgs,
) -> Result<ColumnarValue> {
let args = &args.args;
match args.len() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed ?
Isn't this provided by the signature ?

let pattern = pattern.unwrap();
let result = match &args[0] {
ColumnarValue::Scalar(ScalarValue::Utf8(_)) => {
let array = StringArray::from(vec![value]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the idea of the optimisation to not construct arrays for scalar values ?
IMO this should directly use regex::Regex

Comment on lines +401 to +403
if flags == Some("g") {
return plan_err!("regexp_like() does not support the \"global\" option");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if flags == Some("g") {
return plan_err!("regexp_like() does not support the \"global\" option");
}
if let Some(flagz) = flags && flagz.contains("g") {
return plan_err!("regexp_like() does not support the \"global\" option");
}

the third argument could be "ig", i.e. several flags, not just one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call this fix out in the PR title/body now, and preferably add a test for it

Comment on lines +164 to +168
if flags == Some("g") {
return plan_err!(
"regexp_like() does not support the \"global\" option"
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if flags == Some("g") {
return plan_err!(
"regexp_like() does not support the \"global\" option"
);
}
if let Some(flagz) = flags && flagz.contains("g") {
return plan_err!(
"regexp_like() does not support the \"global\" option"
);
}

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have sufficient test coverage for these new execution branches?

Comment on lines +401 to +403
if flags == Some("g") {
return plan_err!("regexp_like() does not support the \"global\" option");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call this fix out in the PR title/body now, and preferably add a test for it

@kumarUjjawal kumarUjjawal changed the title perf: Optimize scalar fast path for regexp_like perf: Optimize scalar fast path for regexp_like and rejects g inside combined flags like ig Feb 19, 2026
}

fn regexp_like_scalar(
value: &ScalarValue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's inconsistent how this function accepts ScalarValues but the sibling function regexp_like_array_scalar directly accepts Option<&str>

kumarUjjawal and others added 2 commits February 21, 2026 09:44
Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
@Omega359
Copy link
Contributor

🤖 /home/bruce/gh_compare_branch_bench.sh Benchmark Script Running
Linux fedora 6.18.12-200.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Feb 16 18:58:26 UTC 2026 x86_64 GNU/Linux
Comparing perf/rlike_scalar_path (071b406) to 9c6a35f diff
BENCH_NAME=regx
BENCH_COMMAND=cargo bench --bench regx
BENCH_FILTER=
BENCH_BRANCH_NAME=perf_rlike_scalar_path
Results will be posted here when complete

@Omega359
Copy link
Contributor

🤖: Benchmark completed

Details

group                           main                                    perf_rlike_scalar_path
-----                           ----                                    ----------------------
regexp_count_1000 string        1.00  1686.5±13.13µs        ? ?/sec     1.00  1684.1±10.68µs        ? ?/sec
regexp_count_1000 utf8view      1.00  1731.2±52.85µs        ? ?/sec     1.01  1745.3±152.61µs        ? ?/sec
regexp_instr_1000 string        1.01      2.1±0.01ms        ? ?/sec     1.00      2.0±0.11ms        ? ?/sec
regexp_instr_1000 utf8view      1.02      2.0±0.02ms        ? ?/sec     1.00      2.0±0.01ms        ? ?/sec
regexp_like scalar utf8                                                 1.00      8.7±0.52µs        ? ?/sec
regexp_like_1000                1.14  1950.7±283.71µs        ? ?/sec    1.00  1711.8±12.80µs        ? ?/sec
regexp_like_1000 utf8view       1.00  1748.7±30.97µs        ? ?/sec     1.00  1757.2±100.01µs        ? ?/sec
regexp_match_1000               1.03      2.1±0.17ms        ? ?/sec     1.00      2.1±0.01ms        ? ?/sec
regexp_match_1000 utf8view      1.01      2.1±0.02ms        ? ?/sec     1.00      2.1±0.10ms        ? ?/sec
regexp_replace_1000             1.02  1585.7±15.24µs        ? ?/sec     1.00  1557.7±10.10µs        ? ?/sec
regexp_replace_1000 utf8view    1.02  1591.6±29.93µs        ? ?/sec     1.00  1563.6±53.55µs        ? ?/sec

@Jefffrey Jefffrey added this pull request to the merge queue Feb 22, 2026
Merged via the queue into apache:main with commit f488a90 Feb 22, 2026
28 checks passed
@Jefffrey
Copy link
Contributor

Thanks @kumarUjjawal, @martin-g & @Omega359

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants