Merged
Conversation
- Updated README.md to clarify that items can be strings or other serializable objects. - Modified the Vicinity class to accept a broader range of item types by changing type hints from `str` to `Any` in several methods. - Enhanced the insert and delete methods to handle non-string tokens appropriately, ensuring that items can be checked and managed regardless of their type.
… and evaluating backends
- Simplified the logic for checking and appending tokens in the insert method, ensuring that duplicate tokens are properly managed.
- Updated the `items` fixture to return a mix of dictionaries and strings based on index parity. - Modified `test_vicinity_insert_duplicate` to use the updated `items` fixture for inserting items. - Adjusted `test_vicinity_delete_and_query` to reference items by their indices instead of hardcoded values. - Enhanced the Vicinity class to streamline token management, ensuring proper handling of duplicates and improving error messaging for token deletions.
Co-authored-by: Stephan Tulkens <[email protected]>
…ling - Replaced the nested loop for checking duplicates with a single extend operation for tokens. - Improved efficiency by directly appending tokens to the items list, ensuring proper management of duplicates.
…ling - Replaced the nested loop for token matching with a more efficient list comprehension. - Enhanced error messaging to specify which tokens were not found in the vector space.
- Added a try-except block around the JSON serialization process to catch JSONEncodeError.
- Introduced a new pytest fixture `non_serializable_items` that generates a list of non-serializable objects for testing. - Added a test case `test_vicinity_save_and_load_non_serializable_items` to verify that saving a Vicinity instance with non-serializable items raises a JSONEncodeError. - Updated the Vicinity class documentation to specify that JSONEncodeError may be raised if items are not serializable.
- Introduced HuggingFaceMixin to enable saving and loading Vicinity instances to/from Hugging Face Hub - Added optional import of HuggingFaceMixin based on huggingface_hub and datasets library availability - Implemented methods for pushing Vicinity instances to the Hub, including dataset and metadata upload - Created a method to load Vicinity instances from Hugging Face repositories
Codecov ReportAttention: Patch coverage is
|
Pringled
requested changes
Feb 15, 2025
Member
Pringled
left a comment
There was a problem hiding this comment.
Really nice integration, would be awesome to have this in Vicinity 🎉 Left some minor feedback
Member
|
Also feel free to add an example to the main README (e.g. in the quickstart there's a saving/loading part, it can be added under there) |
…aset card template - Added a dataset card template for Hugging Face Hub uploads - Improved error handling for Hugging Face integration with custom import error - Updated `push_to_hub` method to include model name/path in configuration - Removed conditional import of Hugging Face libraries in `vicinity.py` - Added `huggingface` optional dependency in `pyproject.toml`
… and Hugging Face integration - Added new optional dependency groups for integrations and backends in pyproject.toml - Updated README.md with new installation instructions for specific integrations and backends - Added documentation for pushing and loading vector stores from Hugging Face Hub - Simplified and clarified installation options in README
- Implemented a new test case for loading a Vicinity instance from Hugging Face Hub - Added test to verify the print statement when loading from a repository - Introduced a constant for the print statement in the Hugging Face integration module - Updated the print statement to use string formatting for better flexibility
- Deleted `tests/test_utils.py` containing tests for normalization utility functions - Removed `tests/test_vicinity.py` with comprehensive test cases for the Vicinity class - These test files are no longer needed, likely due to refactoring or migration of tests
Contributor
Author
|
- Implemented `test_utils.py` with tests for vector normalization functions - Created `test_vicinity.py` with extensive test cases covering Vicinity class methods - Added `test_huggingface.py` to test Hugging Face integration functionality - Included tests for various scenarios such as: * Initialization and vector handling * Querying and thresholding * Insertion and deletion of vectors * Saving and loading vector stores * Handling non-serializable items * Hugging Face Hub integration
Pringled
approved these changes
Feb 28, 2025
Member
Pringled
left a comment
There was a problem hiding this comment.
LGTM, very nice! I will put the test dataset on our org as well and do a small followup to change the path. Merging!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I added a basic integration for the Hub.
Resulting in https://huggingface.co/datasets/davidberenstein1957/my-vicinity-repo
You can also find datasets here: https://huggingface.co/datasets?other=vicinity