Skip to content

perf: Optimize scalar path for ltrim function#20032

Closed
kumarUjjawal wants to merge 4 commits intoapache:mainfrom
kumarUjjawal:perf/ltrim_scalar_path
Closed

perf: Optimize scalar path for ltrim function#20032
kumarUjjawal wants to merge 4 commits intoapache:mainfrom
kumarUjjawal:perf/ltrim_scalar_path

Conversation

@kumarUjjawal
Copy link
Contributor

@kumarUjjawal kumarUjjawal commented Jan 27, 2026

Which issue does this PR close?

Rationale for this change

ltrim currently routes scalar inputs through make_scalar_function, which converts scalar values into size-1 arrays and then converts the result back. This adds avoidable overhead in constant-folding / scalar evaluation scenarios.

What changes are included in this PR?

  • Add a match-based scalar fast path in LtrimFunc::invoke_with_args
  • Handles Utf8, Utf8View, and LargeUtf8
  • Handles early null returns for scalar null inputs
Type Before After Improvement
ltrim/scalar_utf8 116.12 ns 112.58 ns −3.1%
ltrim/scalar_utf8view 116.37 ns 112.05 ns −3.7%

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the functions Changes to functions implementation label Jan 27, 2026
@kumarUjjawal
Copy link
Contributor Author

Made changes to common.rs as I wanted to work on other trim functions in seperate PRs and it would be handy to have the shared code.

.iter()
.any(|v| matches!(v, ColumnarValue::Scalar(s) if s.is_null()))
{
if args.iter().any(|v| matches!(v, ColumnarValue::Array(_))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't return a columnar array here; it should just be columnar scalar only

match args[0].data_type() {
DataType::Utf8 | DataType::Utf8View => make_scalar_function(
ltrim::<i32>,
vec![Hint::Pad, Hint::AcceptsSingular],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit surprising we get such a speedup when make_scalar_function with hints should already ensure we don't expand the arrays 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do another benchmark to ensure i didn't mess up μs and ns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe benchmark the null fast path changes separately from this scalar fast path change as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another benchmark, i had messed up the privious one, now it only shows around ~3% of speedup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you amend the PR body results in that case, since they seem to mislead the performance improvements of this PR

}

// Scalar fast path
if args.iter().all(|v| matches!(v, ColumnarValue::Scalar(_))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an iter check on this when we have at most 2 arguments is overkill; just use a match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would subsequent PRs for other trim fundtions a good effort or should I leave those?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we identify in this PR which changes actually lead to speedup? If its the null handling or the new scalar fast path? I'm hesistant on this PR since though we have small gains, the scalar fast path in particular is confusing since it tacks on a new scalar fast path when the existing fast path still remains (but would remain unused?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did all the possible combination benchmark and the scalar path provides negligible speedup also the common.rs changes are net negative. We are better droping of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I close this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that would be best

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jefffrey for your time. I should have done the benchmark properly since the start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants