Skip to content

[branch-47] Fix stack overflow for large binary operations#25

Merged
fmonjalet merged 1 commit intobranch-47from
monjalet/fix-binop-stack-verflow
May 20, 2025
Merged

[branch-47] Fix stack overflow for large binary operations#25
fmonjalet merged 1 commit intobranch-47from
monjalet/fix-binop-stack-verflow

Conversation

@fmonjalet
Copy link

Cherry pick of apache#16031

… that translate to DataFusion binary operators (apache#16031)

* Add substrait consumer test causing a stack overflow

* Mitigate stack overflow for substrait binary op with large arg list

When transforming a substrait function call to DataFusion logical plan,
if the substrait function maps to a DataFusion binary operator, but has
more than 2 arguments, it is mapped to a tree of BinaryExpr. This
BinaryExpr tree is not balanced, and its depth is the number of
arguments:

       Op
      /  \
    arg1  Op
         /  \
       arg2  ...
             /  \
           argN  Op

Since many functions manipulating the logical plan are recursive, it
means that N arguments result in an O(N) recursion, leading to stack
overflows for large N (1000 for example).

Transforming these function calls into a balanced tree mitigates the
issue:

             .__ Op __.
            /          \
          Op            Op
         /  \          /  \
       ...  ...      ...  ...
      /  \  /  \    /  \  /  \
    arg1        ...          argN

The recursion depth is now O(log2(N)), meaning that 1000 arguments
results in a depth of ~10, and it would take 2^1000 arguments to reach a
depth of 1000, which is a vastly unreasonable amount of data.

Therefore, it's not possible to use this flaw anymore to trigger stack
overflows in processes running DataFusion.

* arg_list_to_binary_op_tree: avoid cloning Expr

* cargo fmt

* from_scalar_function: improve error handling

* Move test_binary_op_large_argument_list test to scalar_function module

* arg_list_to_binary_op_tree: add more unit tests

Courtesy of @gabotechs

* substrait consumer scalar_function tests: more explicit function name

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
(cherry picked from commit 2eaac22)
@fmonjalet fmonjalet changed the base branch from main to branch-47 May 20, 2025 07:26
@gabotechs gabotechs changed the title Fix stack overflow for large binary operations [branch-47] Fix stack overflow for large binary operations May 20, 2025
@fmonjalet fmonjalet merged commit 11f5af5 into branch-47 May 20, 2025
52 of 56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants