Skip to content

Hash small Uint8Arrays (≤128 bytes) by content rather than reference #779

Merged
KyleAMathews merged 6 commits intomainfrom
claude/fix-bug-report-011CUspWNFrtKDtqEKZ2WDpR
Nov 10, 2025
Merged

Hash small Uint8Arrays (≤128 bytes) by content rather than reference #779
KyleAMathews merged 6 commits intomainfrom
claude/fix-bug-report-011CUspWNFrtKDtqEKZ2WDpR

Conversation

@KyleAMathews
Copy link
Collaborator

Small Uint8Arrays (≤128 bytes) are now hashed by their content rather than by reference, enabling proper equality comparisons for binary IDs like ULIDs (16 bytes). Large arrays (>128 bytes) continue to be hashed by reference to avoid performance costs.

This allows the eq expression function to correctly compare binary IDs without forcing users to use the more expensive functional expression variant.

Changes:

  • Added UINT8ARRAY_CONTENT_HASH_THRESHOLD constant (128 bytes)
  • Implemented hashUint8Array() function for content-based hashing
  • Modified hashObject() to check byteLength and hash small arrays by content
  • Updated existing tests to reflect new behavior for small and large arrays

Fixes issue where binary ULIDs couldn't be compared using eq due to reference-based hashing.

🎯 Changes

✅ Checklist

  • I have followed the steps in the Contributing guide.
  • I have tested this code locally with pnpm test:pr.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

@changeset-bot
Copy link

changeset-bot bot commented Nov 7, 2025

🦋 Changeset detected

Latest commit: ede4683

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 11 packages
Name Type
@tanstack/db Patch
@tanstack/db-ivm Patch
@tanstack/angular-db Patch
@tanstack/electric-db-collection Patch
@tanstack/powersync-db-collection Patch
@tanstack/react-db Patch
@tanstack/rxdb-db-collection Patch
@tanstack/solid-db Patch
@tanstack/svelte-db Patch
@tanstack/trailbase-db-collection Patch
@tanstack/vue-db Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Nov 7, 2025

More templates

@tanstack/angular-db

npm i https://pkg.pr.new/@tanstack/angular-db@779

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@779

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@779

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@779

@tanstack/offline-transactions

npm i https://pkg.pr.new/@tanstack/offline-transactions@779

@tanstack/powersync-db-collection

npm i https://pkg.pr.new/@tanstack/powersync-db-collection@779

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@779

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@779

@tanstack/rxdb-db-collection

npm i https://pkg.pr.new/@tanstack/rxdb-db-collection@779

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@779

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@779

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@779

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@779

commit: ede4683

@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2025

Size Change: +206 B (+0.26%)

Total Size: 80 kB

Filename Size Change
./packages/db/dist/esm/query/compiler/evaluators.js 1.35 kB +14 B (+1.04%)
./packages/db/dist/esm/utils/comparison.js 852 B +192 B (+29.09%) 🚨
ℹ️ View Unchanged
Filename Size
./packages/db/dist/esm/collection/change-events.js 1.36 kB
./packages/db/dist/esm/collection/changes.js 977 B
./packages/db/dist/esm/collection/events.js 388 B
./packages/db/dist/esm/collection/index.js 3.12 kB
./packages/db/dist/esm/collection/indexes.js 1.1 kB
./packages/db/dist/esm/collection/lifecycle.js 1.67 kB
./packages/db/dist/esm/collection/mutations.js 2.26 kB
./packages/db/dist/esm/collection/state.js 3.43 kB
./packages/db/dist/esm/collection/subscription.js 2.42 kB
./packages/db/dist/esm/collection/sync.js 2.12 kB
./packages/db/dist/esm/deferred.js 207 B
./packages/db/dist/esm/errors.js 4.11 kB
./packages/db/dist/esm/event-emitter.js 748 B
./packages/db/dist/esm/index.js 2.36 kB
./packages/db/dist/esm/indexes/auto-index.js 731 B
./packages/db/dist/esm/indexes/base-index.js 766 B
./packages/db/dist/esm/indexes/btree-index.js 1.87 kB
./packages/db/dist/esm/indexes/lazy-index.js 1.1 kB
./packages/db/dist/esm/indexes/reverse-index.js 513 B
./packages/db/dist/esm/local-only.js 837 B
./packages/db/dist/esm/local-storage.js 2.04 kB
./packages/db/dist/esm/optimistic-action.js 359 B
./packages/db/dist/esm/paced-mutations.js 496 B
./packages/db/dist/esm/proxy.js 3.22 kB
./packages/db/dist/esm/query/builder/functions.js 606 B
./packages/db/dist/esm/query/builder/index.js 3.85 kB
./packages/db/dist/esm/query/builder/ref-proxy.js 917 B
./packages/db/dist/esm/query/compiler/expressions.js 674 B
./packages/db/dist/esm/query/compiler/group-by.js 1.8 kB
./packages/db/dist/esm/query/compiler/index.js 1.96 kB
./packages/db/dist/esm/query/compiler/joins.js 2 kB
./packages/db/dist/esm/query/compiler/order-by.js 1.17 kB
./packages/db/dist/esm/query/compiler/select.js 1.07 kB
./packages/db/dist/esm/query/ir.js 673 B
./packages/db/dist/esm/query/live-query-collection.js 360 B
./packages/db/dist/esm/query/live/collection-config-builder.js 5.15 kB
./packages/db/dist/esm/query/live/collection-registry.js 264 B
./packages/db/dist/esm/query/live/collection-subscriber.js 1.77 kB
./packages/db/dist/esm/query/live/internal.js 130 B
./packages/db/dist/esm/query/optimizer.js 2.6 kB
./packages/db/dist/esm/scheduler.js 1.21 kB
./packages/db/dist/esm/SortedMap.js 1.18 kB
./packages/db/dist/esm/strategies/debounceStrategy.js 237 B
./packages/db/dist/esm/strategies/queueStrategy.js 418 B
./packages/db/dist/esm/strategies/throttleStrategy.js 236 B
./packages/db/dist/esm/transactions.js 2.9 kB
./packages/db/dist/esm/utils.js 881 B
./packages/db/dist/esm/utils/browser-polyfills.js 304 B
./packages/db/dist/esm/utils/btree.js 5.61 kB
./packages/db/dist/esm/utils/index-optimization.js 1.49 kB
./packages/db/dist/esm/utils/type-guards.js 157 B

compressed-size-action::db-package-size

@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2025

Size Change: 0 B

Total Size: 3.34 kB

ℹ️ View Unchanged
Filename Size
./packages/react-db/dist/esm/index.js 225 B
./packages/react-db/dist/esm/useLiveInfiniteQuery.js 1.17 kB
./packages/react-db/dist/esm/useLiveQuery.js 1.11 kB
./packages/react-db/dist/esm/useLiveSuspenseQuery.js 431 B
./packages/react-db/dist/esm/usePacedMutations.js 401 B

compressed-size-action::react-db-package-size

@KyleAMathews KyleAMathews force-pushed the claude/fix-bug-report-011CUspWNFrtKDtqEKZ2WDpR branch 3 times, most recently from 324741e to bee92ee Compare November 7, 2025 15:29
Copy link
Collaborator

@samwillis samwillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Just one comment on making a method public

// @ts-expect-error - _writeByte is private but we need to use it here
hasher._writeByte(input[i]!)
}
return hasher.digest()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this method public rather than use the comment to ignore the type error.

Fixes `eq` function and hash indexing to compare Uint8Arrays/Buffers by content
instead of reference, enabling proper ULID comparisons in WHERE clauses.

Changes:
- Hash small Uint8Arrays (≤128 bytes) by content in db-ivm for better indexing
- Compare Uint8Arrays by content in eq operator via areValuesEqual() function
- Add comprehensive tests for Uint8Array equality comparison
@KyleAMathews KyleAMathews force-pushed the claude/fix-bug-report-011CUspWNFrtKDtqEKZ2WDpR branch from bee92ee to 421e32d Compare November 7, 2025 17:05
Add tests that specifically cover the user's reproduction case where
Uint8Arrays are created with a length (e.g., new Uint8Array(5)) resulting
in zero-filled arrays. Confirms that content comparison works correctly.
Add explicit tests for string and number equality to ensure that the
areValuesEqual function doesn't break primitive comparisons. All tests
pass, confirming the implementation correctly handles both Uint8Arrays
and primitives.
Fixes  function and hash indexing to compare Uint8Arrays/Buffers by content
instead of reference, enabling proper ULID comparisons in WHERE clauses.

The issue was that  used  which compares Uint8Arrays by reference.
Now it uses  which compares Uint8Arrays byte-by-byte.

Changes:
- Hash small Uint8Arrays (≤128 bytes) by content in db-ivm for better indexing
- Compare Uint8Arrays by content in eq operator via areValuesEqual() function
- Made writeByte() public in MurmurHashStream
- Add comprehensive tests for Uint8Array equality comparison
- Add integration test reproducing the user's exact scenario

All tests pass (84/84 evaluator tests, 1/1 integration test).
The previous fix handled Uint8Array comparison at the expression
evaluation level, but index lookups still failed because JavaScript
Maps use reference equality for object keys.

Updated normalizeValue() to convert Uint8Arrays/Buffers to string
representations that can be used as Map keys with content-based
equality. This enables proper index lookups for binary IDs like
ULIDs when auto-indexing is enabled (the default behavior).

Also updated the integration test to verify the fix works with
auto-indexing enabled.
…ation

Applied the same 128-byte threshold to normalizeValue() as used in
the hashing function. This prevents creating giant strings in memory
when indexing large Uint8Arrays (> 128 bytes).

Arrays larger than 128 bytes will fall back to reference equality,
which is acceptable as the fix is primarily for ID use cases (ULIDs
are 16 bytes, UUIDs are 16 bytes).

Added test coverage to verify the threshold behavior works as expected.
@KyleAMathews KyleAMathews merged commit 7aedf12 into main Nov 10, 2025
7 checks passed
@KyleAMathews KyleAMathews deleted the claude/fix-bug-report-011CUspWNFrtKDtqEKZ2WDpR branch November 10, 2025 18:17
@github-actions github-actions bot mentioned this pull request Nov 10, 2025
@github-actions
Copy link
Contributor

🎉 This PR has been released!

Thank you for your contribution!

KyleAMathews pushed a commit that referenced this pull request Nov 24, 2025
…parison

This fixes a bug where joining collections by small Uint8Array keys would
fail because the keys were compared by reference instead of by content.

The fix normalizes Uint8Array keys (≤128 bytes) to string representations
before using them as Map keys, similar to the approach used in PR #779 for
value comparison. This ensures that Uint8Array instances with the same
byte content are treated as equal, even if they are different objects.

Changes:
- Added normalizeKey() function to convert small Uint8Arrays to string keys
- Updated Index class to normalize keys in all Map operations
- Maintained mapping from normalized keys to original keys for iteration
- Added test case for joining collections with Uint8Array keys

Fixes #896
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants