feat: Support arbitrary boolean expressions with subqueries (OR, NOT)#3791
feat: Support arbitrary boolean expressions with subqueries (OR, NOT)#3791
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
❌ 7 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
@coderabbitai review |
5a3dbbb to
3105a9c
Compare
|
Found 26 test failures on Blacksmith runners: Failures
|
9b57934 to
905ee8d
Compare
905ee8d to
e5ce334
Compare
|
Hi @robacourt, while implementing the DNF/active_conditions support in the ts/db client, I noticed that disjunct_positions =
case derive_disjunct_positions(normalized_new) do
[] -> current_data && current_data.disjunct_positions
positions -> positions
endThis is redundant work — In the TS client we derive it once from the first tagged message and store it as a single variable shared across all rows. Might be worth doing the same in the Elixir client — just a single |
…rbitrary boolean WHERE clauses Support the new Electric server wire protocol (PR electric-sql/electric#3791): - Change tag delimiter from `|` to `/`, replace `_` wildcards with empty segments (NON_PARTICIPATING positions) - Add `active_conditions` header support for DNF visibility evaluation - Shapes with subquery dependencies use DNF: a row is visible if ANY disjunct has ALL its positions satisfied in active_conditions - Simple shapes (no subquery dependencies) retain existing behavior: row deleted when tag set becomes empty - Derive disjunct_positions once per shape (not per-row like the Elixir client) since the DNF structure is fixed by the WHERE clause Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Also, i think it would be good to add the unit tests we have in ts/db for this feature (cf. https://github.com/TanStack/db/pull/1270/changes#diff-286083ece2106d963fcda491df8579332e28dd8219a45a51c967afff26d70747) to the Elixir client too. |
Phase 2 of the arbitrary boolean expressions implementation plan. SqlGenerator.to_sql/1 is the inverse of Parser — it converts the parsed WHERE clause AST back into SQL strings, used by querying.ex for active_conditions SELECT columns and subquery_moves.ex for exclusion clauses. Handles all AST node types the parser can produce: comparison, pattern matching, nullability, boolean tests, logical connectives, membership, DISTINCT, ANY/ALL, arithmetic, bitwise, string/array operators, named functions, type casts, date/time/interval constants, row expressions, and sublink membership checks. Property-based test generates 1,000 arbitrary WHERE clauses with column references and verifies to_sql output is re-parseable, ensuring SqlGenerator stays in sync with the parser. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Limits DNF decomposition to 100 disjuncts, returning {:error, reason}
when exceeded. This prevents unbounded complexity explosion from
deeply nested AND-over-OR distribution.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Encapsulates all DNF-related state in a separate struct that lives in Consumer.State rather than on the Shape struct. Provides position-to-dependency mapping, negated position tracking, active_conditions computation, and DNF evaluation. Built from Shape at consumer startup via from_shape/1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update fill_tag_structure to use Decomposer.decompose/1 to build multi-disjunct tag structures. Each disjunct is a list with one entry per DNF position (nil for non-participating positions, column name(s) for participating ones). Update fill_move_tags/make_tags_from_pattern to produce 2D arrays (list of lists) instead of slash-delimited strings. The wire format conversion happens later in log_items.ex. Add build_tag_structure_from_dnf to SubqueryMoves which extracts column names from each subexpression's AST using Walker.reduce!. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace or_with_subquery? and not_with_subquery? fields with dnf_context in Consumer.State. DnfContext is built from the Shape at initialization time via DnfContext.from_shape/1. Remove AST-walking helpers (has_or_with_subquery?, has_not_with_subquery?, subtree_has_sublink?) since DNF decomposition handles these cases. Simplify invalidation check in consumer.ex to only check the feature flag, since OR/NOT and multi-dependency shapes are now handled by DNF. Update tests to verify dnf_context behavior with equivalent coverage for all original OR and NOT with subquery edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
compute_active_conditions/4 and evaluate_dnf/2 were implemented in DnfContext (Phase 3) rather than WhereClause as originally planned. This is architecturally cleaner since DnfContext owns the decomposition state. No additional changes needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add DNF-aware exclusion clause infrastructure for multi-disjunct WHERE clauses. Move-out control messages now use position-based patterns. - Add extract_sublink_index/1, find_dnf_positions_for_dep_index/2 - Add build_dnf_exclusion_clauses/4 and generate_disjunct_exclusion/4 - Add get_column_sql/2 for comparison expression SQL generation - Move-out patterns include pos field for position-aware filtering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… visibility
Update the Elixir client to handle the new wire format with slash-delimited
tags, active_conditions, and position-based move-in/move-out.
- Add active_conditions field to Headers struct and parse from messages
- Rewrite TagTracker for position-based {pos, hash} indexing in tag_to_keys
- Add normalize_tags/1 for slash-delimited tag normalization to 2D arrays
- Add row_visible?/2 for DNF-based visibility evaluation (OR of AND of positions)
- Add handle_move_in/3 for activating positions on move-in events
- Update generate_synthetic_deletes/4 for DNF-aware deletion (only delete
when no disjunct is satisfied)
- Update existing tests for new internal format, add DNF wire format tests
- Widen ResumeMessage types for position-based tag format
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both stream_initial_data and query_move_in were not passing dnf_context to json_like_select, so the generated SQL never included active_conditions in the JSON headers. This meant clients receiving initial snapshot data or move-in query results had no active_conditions to work with. Fix: compute DnfContext.from_shape(shape) in both functions and pass it through to json_like_select, which already supports it as an optional 5th parameter. Also fix a latent SQL generation bug in build_headers_part where the active_conditions injection added a trailing single-quote that conflicted with the outer template's closing quote, producing invalid SQL like '}'' instead of '}'. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shapes with OR or NOT combined with subqueries produce a wire format (multi-disjunct tags, active_conditions, position-based move-in/out) that protocol v1 clients cannot parse. Reject these shapes at request time with a 400 error directing the client to upgrade. The plug extracts the Electric-Protocol-Version header (defaulting to 1) and passes it through as a param. After shape creation, the params module checks whether the shape requires protocol v2 by decomposing the WHERE clause and looking for multiple disjuncts or negated subquery positions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Five bugs that caused data mismatches in multi-disjunct (OR) shapes with
subqueries, verified by the oracle property test at SHAPE_COUNT=500:
1. Oracle test client missing protocol version 2 header, causing 400
rejections for complex DNF shapes.
2. Move-out patterns targeting wrong DNF positions — make_move_out_pattern
now filters positions by dependency handle via DnfContext.
3. Main shape consumer incorrectly processing materializer changes for
nested dependencies that should only propagate through the dependency
chain.
4. Materializer tag indexing using flat strings instead of position-aware
{pos, hash} tuples, preventing correct position-based move-out matching.
5. Move-in snapshot filtering applying moved_out_tags from ALL dependencies
instead of only the triggering dependency. When different positions
reference the same column, a move-out at one position would incorrectly
filter rows from another position's move-in query.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
-The Phase 4 commit changed `make_tags_from_pattern` to produce
lists-of-lists instead of slash-separated strings, but the
materializer's `add_row_to_tag_indices` / `remove_row_from_tag_indices`
still had `when is_binary(tag)` guards. When tags arrived as lists (from
WAL changes via `fill_move_tags`), the guard failed with a
`function_clause` error, crashing the materializer and cascading to 409s
for all dependent shapes.
-**Fix**: Added a `tag_to_position_entries/1` clause for lists, and
removed the `when is_binary(tag)` guards from the reduce callbacks.
Bug 2: Move-in WHERE clause needs DNF reconstruction (causes 409s
and incorrect queries)
-The old `move_in_where_clause` used string replacement on
`shape.where.query` to replace `IN (SELECT ...)` with `= ANY($1)`. This
breaks for:
-- **NOT IN**: `libpg_query` deparses `x NOT IN (SELECT ...)` as `NOT x
IN (SELECT ...)`, so the string replacement target `NOT IN (SELECT ...)`
isn't found
-- **Multi-disjunct**: The replacement should only apply the triggering
disjunct's conditions, not the whole WHERE
-- **IN-list expansion**: `id IN ('a', 'b', 'c')` becomes `(id='a') OR
(id='b') OR (id='c')` in the AST, creating multiple disjuncts — only
picking the first misses rows
-**Fix**: Implemented `build_dnf_move_in_where` which reconstructs the
WHERE from the DNF decomposition: finds ALL disjuncts containing the
trigger, builds trigger SQL (`= ANY($1)`) plus each disjunct's
non-trigger conditions (OR'd together). Also handles identical
subqueries at different dep indices by collecting trigger positions
across all dep indices sharing the same handle.
The DNF-aware path is always used in production since shapes with dependencies always have a DnfContext. Remove the dead string-replacement fallback, make dnf_context a required parameter, and drop the unused opts/remove_not parameter. Update tests to construct DnfContext and adjust expected SQL to match SqlGenerator output (quoted column names). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the intermediate list-of-lists tag format. Tags are now slash-delimited strings from production in make_tags_from_pattern all the way through to the wire. This eliminates tags_to_wire in log_items.ex and the is_list guard in materializer's tag_to_position_entries. The implementation plan is updated to reflect two formats (condition_hashes and tags) instead of three. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These tests previously expected 400/409 because OR and NOT combined with subqueries weren't supported. Now that the DNF-based implementation handles these cases, the tests pass protocol version 2 and assert correct move-in/move-out events instead of shape invalidation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tags now include one hash per DNF position separated by slashes. Tests for params and quoted column names are updated to construct the full slash-delimited tag string. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multiple same-level subqueries now correctly handle move-ins via DNF decomposition. The test is updated to assert the move-in response with the expected row data instead of expecting a 409 invalidation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace Enum.take/Enum.random/:rand usage with pure StreamData generators consumed directly by `check all`, which wires --seed through properly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When generate_synthetic_deletes deleted a key from key_data (all DNF disjuncts unsatisfied), only the matched pattern entries were removed from tag_to_keys (first pass via Map.pop). The remaining non-matched entries for that key persisted as orphans. When the key was later re-added via move-in INSERT with a new hash, these stale entries could match future deactivation patterns, corrupting active_conditions and causing phantom deletes. Added a third pass to clean up remaining tag_to_keys entries for all deleted keys. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…l field disjunct_positions is determined solely by the shape's WHERE clause (DNF structure), not by individual row data. Previously it was stored per-key in key_data and re-derived on every tagged message. Now it's derived once from the first tagged message and stored as a single field alongside tag_to_keys/key_data, matching the approach used in the TS client. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds 4 new test cases covering DNF scenarios: multi-disjunct partial/full deactivation, active_conditions overwrite on re-send, simple shapes without active_conditions, and mixed rows with/without active_conditions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…etes
The materializer stored prev_value_counts which was set from value_counts
before each apply_changes. After the initial snapshot, this was always {}
since the initial load used apply_changes (not apply_changes_and_notify).
The exclusion context used these stale values, causing phantom deletes in
OR shapes when tag re-establishment was prevented.
Replace with LSN-keyed snapshots (at most 2 entries) that store post-apply
value_counts at each tx_offset. The materializer_changes message now
includes tx_offset as a 4th element, enabling consumers to look up the
correct snapshot for each dependency.
Key changes:
- State: seen_deps MapSet → dep_lsns map (dep_handle → tx_offset)
- Materializer: prev_value_counts → snapshots list, new get_link_values/2
with tx_offset-based lookup, snapshot_values_for returns empty for
tx_offset=0 (unseen deps) to disable exclusion and allow tag
re-establishment
- Consumer: reset dep_lsns at start of each materializer_changes handler
(not just in do_handle_txn) so deps from previous events don't
incorrectly populate the exclusion context
- Move handling: uses get_dep_lsn + get_link_values with LSN lookup
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e-in queries Previously, exclusion clauses were skipped when a non-containing disjunct had non-subquery positions (e.g. literal conditions like status = 'active'), relying on client-side tag deduplication. Now both generate_disjunct_exclusion and generate_materialized_disjunct_exclusion handle non-subquery positions by converting their ASTs to SQL. Also raise on nil dnf_context in build_exclusion_context since shapes with dependencies must always have one. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…le_view_test.exs:267
…in? row loss bug When a row's column value changes AND a subquery dependency changes in the same transaction with an OR-based WHERE clause, the move-in's exclusion clause (AND NOT ...) filters the row, while change_will_be_covered_by_move_in? skips the WAL change expecting the move-in to deliver it. Both sides defer to the other and the row is lost. Fix: store the non-containing disjunct positions (excluded disjuncts) per pending move-in at query start time. In change_will_be_covered_by_move_in?, evaluate the record's active_conditions against those excluded disjuncts. If the record matches any excluded disjunct, the move-in won't return it, so the WAL change is processed normally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
e5ce334 to
76452d9
Compare
76452d9 to
d55af83
Compare
Summary
This PR implements RFC "Arbitrary Boolean Expressions with Subqueries" which extends Electric's subquery support from single
IN (SELECT ...)conditions to arbitrary boolean expressions with OR, NOT, and multiple subqueries.What's Changed
WHERE project_id IN (SELECT ...) OR assigned_to IN (SELECT ...)WHERE project_id NOT IN (SELECT ...)WHERE (a IN sq1 AND b='x') OR c NOT IN sq2Key Implementation Details
DNF Decomposer (
lib/electric/replication/eval/decomposer.ex)Active Conditions (
lib/electric/shapes/where_clause.ex)active_conditionsarray indicating which atomic conditions are satisfiedMove-in/Move-out Handling (
lib/electric/shapes/consumer/move_handling.ex)Message Format (
lib/electric/log_items.ex,lib/electric/shapes/querying.ex)active_conditionsarray included in row message headersactive_conditionsactive_conditionsfieldTest plan
test/electric/plug/subquery_router_test.exs)test/electric/replication/eval/decomposer_test.exs)test/electric/shapes/where_clause_test.exs)Test Coverage
Related
docs/rfcs/arbitrary-boolean-expressions-with-subqueries.mdpackages/sync-service/IMPLEMENTATION_PLAN.md🤖 Generated with Claude Code