[QDTP-791] Sync to upstream to bring the new topK optim changes from Geoffrey#22
Closed
[QDTP-791] Sync to upstream to bring the new topK optim changes from Geoffrey#22
Conversation
…urrency (apache#15712) * Enable setting default values for target_partitions and planning_concurrency * Fix doc test * Use transform to apply the mapping from 0 to the default parallelism --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* minor * fix
Bumps [http-proxy-middleware](https://github.com/chimurai/http-proxy-middleware) from 2.0.6 to 2.0.9. - [Release notes](https://github.com/chimurai/http-proxy-middleware/releases) - [Changelog](https://github.com/chimurai/http-proxy-middleware/blob/v2.0.9/CHANGELOG.md) - [Commits](chimurai/http-proxy-middleware@v2.0.6...v2.0.9) --- updated-dependencies: - dependency-name: http-proxy-middleware dependency-version: 2.0.9 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ParserError->DataFusionError+attach a diagnostic * fix: ci * fix: fmt * fix:clippy * does this fix ci test? * this fixes sqllogictest * fix: cargo test * fix: fmt * add tests * cleanup * suggestions + expect EOF nicely * fix: clippy
…5594) * Set DataFusion runtime configurations through SQL interface * fix clippy warnings * use spill count based tests for checking applied memory limit --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.171 to 0.2.172. - [Release notes](https://github.com/rust-lang/libc/releases) - [Changelog](https://github.com/rust-lang/libc/blob/0.2.172/CHANGELOG.md) - [Commits](rust-lang/libc@0.2.171...0.2.172) --- updated-dependencies: - dependency-name: libc dependency-version: 0.2.172 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Refactor regexp slt tests * handle null test data
… them (apache#15566) * ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them * wip * fix tests * fix * fix * fix doc * fix doc * Improve doc comments of `filter-pushdown-apis` (#22) * Improve doc comments * Apply suggestions from code review --------- Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * simplify according to pr feedback * Add missing file * Add tests * pipe config in * docstrings * Update datafusion/physical-plan/src/filter_pushdown.rs * fix * fix * fmt * fix doc * add example usage of config * fix test * convert exec API and optimizer rule * re-add docs * dbg * dbg 2 * avoid clones * part 3 * fix lint * tests pass * Update filter.rs * update projection tests * update slt files * fix * fix references * improve impls and update tests * apply stop logic * update slt's * update other tests * minor * rename modules to match logical optimizer, tweak docs --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: berkaysynnada <berkay.sahin@synnada.ai> Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>
* flatten array in a single step instead of recursive * clippy * update flatten type signature to Array * add fixed list to list coercion to flatten signature * support LargeList(List) and LargeList(FixedSizeList) in flatten * add test for LargeList(FixedSizeList) * handle nulls * uncomment flatten(NULL) test - it already works
…ation (apache#15694) * Enhance short-circuit evaluation for binary expressions - Delay evaluation of the right-hand side (RHS) unless necessary. - Optimize short-circuiting for `Operator::And` and `Operator::Or` by checking LHS alone first. - Introduce `get_short_circuit_result` function to determine short-circuit conditions based on LHS and RHS. - Update tests to cover various short-circuit scenarios for both `AND` and `OR` operations. * refactor: rename test_check_short_circuit to test_get_short_circuit_result and update assertions - Renamed the test function for clarity. - Updated assertions to use get_short_circuit_result instead of check_short_circuit. - Added additional test cases for AND and OR operations with expected results. * fix: enhance short-circuit evaluation logic in get_short_circuit_result function for null - Updated AND and OR short-circuit conditions to only trigger when all values are either false or true, respectively, and there are no nulls in the array. - Adjusted test case to reflect the change in expected output. * feat: add debug logging for binary expression evaluation and short-circuit checks * fix: improve short-circuit evaluation logic in BinaryExpr to ensure RHS is only evaluated when necessary * fix: restrict short-circuit evaluation to logical operators in get_short_circuit_result function * add more println!("==> "); * fix: remove duplicate data type checks for left and right operands in BinaryExpr evaluation * feat: add debug prints for dictionary values and keys in binary expression tests * Tests pass * fix: remove redundant short-circuit evaluation check in BinaryExpr and enhance documentation for get_short_circuit_result * refactor: remove unnecessary debug prints and streamline short-circuit evaluation in BinaryExpr * test: enhance short-circuit evaluation tests for nullable and scalar values in BinaryExpr * add benchmark * refactor: improve short-circuit logic in BinaryExpr for logical operators - Renamed `arg` to `lhs` for clarity in the `get_short_circuit_result` function. - Updated handling of Boolean data types to return `None` for null values. - Simplified short-circuit checks for AND/OR operations by consolidating logic. - Enhanced readability and maintainability of the code by restructuring match statements. * refactor: enhance short-circuit evaluation strategy in BinaryExpr to optimize logical operations * Revert "refactor: enhance short-circuit evaluation strategy in BinaryExpr to optimize logical operations" This reverts commit a62df47. * bench: add benchmark for OR operation with all false values in short-circuit evaluation * refactor: add ShortCircuitStrategy enum to optimize short-circuit evaluation in BinaryExpr - Replaced the lazy evaluation of the right-hand side (RHS) with immediate evaluation based on short-circuiting logic. - Introduced a new function `check_short_circuit` to determine if short-circuiting can be applied for logical operators. - Updated the logic to return early for `Operator::And` and `Operator::Or` based on the evaluation of the left-hand side (LHS) and the conditions of the RHS. - Improved clarity and efficiency of the short-circuit evaluation process by eliminating unnecessary evaluations. * refactor: simplify short-circuit evaluation logic in check_short_circuit function * datafusion_expr::lit as expr_lit * refactor: optimize short-circuit evaluation in check_short_circuit function - Simplified logic for AND/OR operations by prioritizing false/true counts to enhance performance. - Updated documentation to reflect changes in array handling techniques. * refactor: add count_boolean_values helper function and optimize check_short_circuit logic - Introduced a new helper function `count_boolean_values` to count true and false values in a BooleanArray, improving readability and performance. - Updated `check_short_circuit` to utilize the new helper function for counting, reducing redundant operations and enhancing clarity in the evaluation logic for AND/OR operations. - Adjusted comments for better understanding of the short-circuiting conditions based on the new counting mechanism. * Revert "refactor: add count_boolean_values helper function and optimize check_short_circuit logic" This reverts commit e2b9f77. * optimise evaluate * optimise evaluate 2 * refactor op:AND, lhs all false op:OR, lhs all true to be faster * fix clippy warning * refactor: optimize short-circuit evaluation logic in check_short_circuit function * fix clippy warning * add pre selection * add some comments * [WIP] fix pre-selection result * fix: Error in calculating the ratio * fix: Correct typo in pre-selection threshold constant and improve pre-selection scatter function documentation * fix doctest error * fix cargo doc * fix cargo doc * test: Add unit tests for pre_selection_scatter function --------- Co-authored-by: Siew Kam Onn <kosiew@gmail.com>
* fix: serialize listing table without partition column * remove unwrap * format * clippy
…e#15726) * coerce FixedSizeBinary to Binary * simplify FixedSizeBytes equality to literal * fix clippy * remove redundant ExprSimplifier case * Add explain test to make sure unwrapping is working correctly --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Bumps [clap](https://github.com/clap-rs/clap) from 4.5.35 to 4.5.36. - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](clap-rs/clap@clap_complete-v4.5.35...clap_complete-v4.5.36) --- updated-dependencies: - dependency-name: clap dependency-version: 4.5.36 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add DataFusion 47.0.0 Upgrade Guide * prettier * Update docs/source/library-user-guide/upgrading.md Co-authored-by: Oleks V <comphead@users.noreply.github.com> * Update docs/source/library-user-guide/upgrading.md Co-authored-by: Oleks V <comphead@users.noreply.github.com> * Fix examples * Try and fix tests again --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>
* Support Accumulator for avg duration * Add tests
* Improve simplify_expressions rule * address comments * address comments
* doc:Add documentation for OPTIONS clause syntax * doc:rename write_options.md to format_options.md and clarify its scope for both reading and writing * doc: change dml.md, cuz still have wrong write_options filename * doc: update doctest reference to renamed format_options.md * docs: update and correct format options documentation * doc: add more information of options content * remove execution settings, move note about insert * wordsmith example --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* fix: parquet coerce_int96 schema * move test to parquet.slt * update based on comphead's suggestion
…age (apache#15644) * Show current SQL recursion limit in RecursionLimitExceeded error message * use recursion_limit setting from sql-parser-options * resolve merge conflicts * move error handling code to helper method
Bumps [sqllogictest](https://github.com/risinglightdb/sqllogictest-rs) from 0.28.0 to 0.28.1. - [Release notes](https://github.com/risinglightdb/sqllogictest-rs/releases) - [Changelog](https://github.com/risinglightdb/sqllogictest-rs/blob/main/CHANGELOG.md) - [Commits](risinglightdb/sqllogictest-rs@v0.28.0...v0.28.1) --- updated-dependencies: - dependency-name: sqllogictest dependency-version: 0.28.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* infer placeholder datatype for IN lists * infer placeholder datatype for Expr::Like * add tests for Expr::SimilarTo --------- Co-authored-by: Kevin <4733573+kczimm@users.noreply.github.com.>
Fixed issue in the Avro reader that caused queries to fail when columns were reordered in the SELECT statement. The reader now correctly: 1. Builds arrays in the order specified in the projection 2. Creates a properly ordered schema matching the projection Previously when selecting columns in a different order than the original schema (e.g., `SELECT timestamp, username FROM avro_table`), the reader would produce error due to type mismatches between the data arrays and the expected schema. Fixes apache#15839
…pache#15901) Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: add union_tag scalar function * update for new api * Add test for second field type --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.44.1 to 1.44.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-1.44.1...tokio-1.44.2) --- updated-dependencies: - dependency-name: tokio dependency-version: 1.44.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: xudong.w <wxd963996380@gmail.com>
Bumps [assert_cmd](https://github.com/assert-rs/assert_cmd) from 2.0.16 to 2.0.17. - [Changelog](https://github.com/assert-rs/assert_cmd/blob/master/CHANGELOG.md) - [Commits](assert-rs/assert_cmd@v2.0.16...v2.0.17) --- updated-dependencies: - dependency-name: assert_cmd dependency-version: 2.0.17 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Factor out Substrait consumers into separate files * Move relations and expressions into their own modules * Refactor: rename rex to expr * Refactor: move from_substrait_extended_expr to mod.rs --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* add table column alias for unnest projection * fix clippy * fix columns check
* feat: Add datafusion-spark crate * spark crate setup * clean up 2 example functions * cleanup crate * Spark crate setup * fix lint issue * cargo cleanup * fix collision in sqllogic * remove redundant test * test float precision when casting to string * reorder * undo * save * save * save * add spark crate * remove spark from core * add comment to import tests * Fix: reset submodule to main pointer and clean state * Save * fix registration * modify float64 precision for spark * Update datafusion/spark/src/lib.rs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * clean up code * code cleanup --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
- Fix typo in introduction.md - Remove period from end of bullet point to maintain consistency with other bullet points
* migrate tests in `push_down_filters.rs` to use snapshot assertions * remove unused format checks * Revert "remove unused format checks" This reverts commit dc4f137. * migrate `assert_eq!` in `push_down_filters.rs` to use snapshot assertions * migrate `assert_eq!` in `push_down_filters.rs` to use snapshot assertions --------- Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
* Add `FormatOptions` to Config * Fix `output_with_header` * Add cli test * Add `to_string` * Prettify * Prettify * Preserve the initial `NULL` logic * Cleanup * Remove `lt` as no longer needed * Format assert * Fix sqllogictest * Fix tests * Set formatting params for dates / times * Lowercase `duration_format` --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* docs: Label �loom_filter_on_read as a reading config * fix: Update configs.md
…zation (apache#15936) * add query to show improvement for 15591. * document the new added query.
Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.14 to 0.7.15. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-util-0.7.14...tokio-util-0.7.15) --- updated-dependencies: - dependency-name: tokio-util dependency-version: 0.7.15 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: xudong.w <wxd963996380@gmail.com>
* migrate `assert_eq!` in `optimize_projection/mod.rs` to use snapshot assertions * migrate `assert_optimized_plan_equal!` in `propagate_empty_relations.rs` to use snapshot assertions * remove all `assert_optimized_plan_eq` * migrate `assert_optimized_plan_equal!` in `decorrelate_predicate_subquery.rs` to use snapshot assertions * Add snapshot assertion macro for optimized plan equality checks --------- Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
…ta columns (apache#15935) * fix query results for predicates referencing partition columns and data columns * fmt * add e2e test * newline
Bumps [substrait](https://github.com/substrait-io/substrait-rs) from 0.55.0 to 0.55.1. - [Release notes](https://github.com/substrait-io/substrait-rs/releases) - [Changelog](https://github.com/substrait-io/substrait-rs/blob/main/CHANGELOG.md) - [Commits](substrait-io/substrait-rs@v0.55.0...v0.55.1) --- updated-dependencies: - dependency-name: substrait dependency-version: 0.55.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* feat: create helpers to set the max_temp_directory_size Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com> * refactor: use helper in cli Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com> * refactor: update error message Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com> * refactor: use setter in tests Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com> --------- Signed-off-by: Jérémie Drouet <jeremie.drouet@gmail.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refactor filter pushdown apis * remove commented out code * fix tests * fail to fix bug * fix * add/fix docs * lint * add some docstrings, some minimal cleaup * review suggestions * add more comments * fix doc links * fmt * add comments * make test deterministic * add bench * fix bench * register bench * fix bench * cargo fmt --------- Co-authored-by: berkaysynnada <berkay.sahin@synnada.ai> Co-authored-by: Berkay Şahin <124376117+berkaysynnada@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
See: apache#15563
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?