Hash small Uint8Arrays (≤128 bytes) by content rather than reference #779
Merged
KyleAMathews merged 6 commits intomainfrom Nov 10, 2025
Merged
Conversation
🦋 Changeset detectedLatest commit: ede4683 The changes in this PR will be included in the next version bump. This PR includes changesets to release 11 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
More templates
@tanstack/angular-db
@tanstack/db
@tanstack/db-ivm
@tanstack/electric-db-collection
@tanstack/offline-transactions
@tanstack/powersync-db-collection
@tanstack/query-db-collection
@tanstack/react-db
@tanstack/rxdb-db-collection
@tanstack/solid-db
@tanstack/svelte-db
@tanstack/trailbase-db-collection
@tanstack/vue-db
commit: |
Contributor
|
Size Change: +206 B (+0.26%) Total Size: 80 kB
ℹ️ View Unchanged
|
Contributor
|
Size Change: 0 B Total Size: 3.34 kB ℹ️ View Unchanged
|
324741e to
bee92ee
Compare
samwillis
approved these changes
Nov 7, 2025
Collaborator
samwillis
left a comment
There was a problem hiding this comment.
Awesome. Just one comment on making a method public
| // @ts-expect-error - _writeByte is private but we need to use it here | ||
| hasher._writeByte(input[i]!) | ||
| } | ||
| return hasher.digest() |
Collaborator
There was a problem hiding this comment.
We can make this method public rather than use the comment to ignore the type error.
Fixes `eq` function and hash indexing to compare Uint8Arrays/Buffers by content instead of reference, enabling proper ULID comparisons in WHERE clauses. Changes: - Hash small Uint8Arrays (≤128 bytes) by content in db-ivm for better indexing - Compare Uint8Arrays by content in eq operator via areValuesEqual() function - Add comprehensive tests for Uint8Array equality comparison
bee92ee to
421e32d
Compare
Add tests that specifically cover the user's reproduction case where Uint8Arrays are created with a length (e.g., new Uint8Array(5)) resulting in zero-filled arrays. Confirms that content comparison works correctly.
Add explicit tests for string and number equality to ensure that the areValuesEqual function doesn't break primitive comparisons. All tests pass, confirming the implementation correctly handles both Uint8Arrays and primitives.
Fixes function and hash indexing to compare Uint8Arrays/Buffers by content instead of reference, enabling proper ULID comparisons in WHERE clauses. The issue was that used which compares Uint8Arrays by reference. Now it uses which compares Uint8Arrays byte-by-byte. Changes: - Hash small Uint8Arrays (≤128 bytes) by content in db-ivm for better indexing - Compare Uint8Arrays by content in eq operator via areValuesEqual() function - Made writeByte() public in MurmurHashStream - Add comprehensive tests for Uint8Array equality comparison - Add integration test reproducing the user's exact scenario All tests pass (84/84 evaluator tests, 1/1 integration test).
The previous fix handled Uint8Array comparison at the expression evaluation level, but index lookups still failed because JavaScript Maps use reference equality for object keys. Updated normalizeValue() to convert Uint8Arrays/Buffers to string representations that can be used as Map keys with content-based equality. This enables proper index lookups for binary IDs like ULIDs when auto-indexing is enabled (the default behavior). Also updated the integration test to verify the fix works with auto-indexing enabled.
…ation Applied the same 128-byte threshold to normalizeValue() as used in the hashing function. This prevents creating giant strings in memory when indexing large Uint8Arrays (> 128 bytes). Arrays larger than 128 bytes will fall back to reference equality, which is acceptable as the fix is primarily for ID use cases (ULIDs are 16 bytes, UUIDs are 16 bytes). Added test coverage to verify the threshold behavior works as expected.
Merged
Contributor
|
🎉 This PR has been released! Thank you for your contribution! |
1 task
KyleAMathews
pushed a commit
that referenced
this pull request
Nov 24, 2025
…parison This fixes a bug where joining collections by small Uint8Array keys would fail because the keys were compared by reference instead of by content. The fix normalizes Uint8Array keys (≤128 bytes) to string representations before using them as Map keys, similar to the approach used in PR #779 for value comparison. This ensures that Uint8Array instances with the same byte content are treated as equal, even if they are different objects. Changes: - Added normalizeKey() function to convert small Uint8Arrays to string keys - Updated Index class to normalize keys in all Map operations - Maintained mapping from normalized keys to original keys for iteration - Added test case for joining collections with Uint8Array keys Fixes #896
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Small Uint8Arrays (≤128 bytes) are now hashed by their content rather than by reference, enabling proper equality comparisons for binary IDs like ULIDs (16 bytes). Large arrays (>128 bytes) continue to be hashed by reference to avoid performance costs.
This allows the
eqexpression function to correctly compare binary IDs without forcing users to use the more expensive functional expression variant.Changes:
Fixes issue where binary ULIDs couldn't be compared using
eqdue to reference-based hashing.🎯 Changes
✅ Checklist
pnpm test:pr.🚀 Release Impact