-
Notifications
You must be signed in to change notification settings - Fork 26
Add interface for PaginatedKVStore. #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| pub(crate) mod paginated_kv_store; |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| use std::io; | ||
|
|
||
| /// Provides an interface that allows storage and retrieval of persisted values that are associated | ||
| /// with given keys, with support for pagination with time-based ordering. | ||
| /// | ||
| /// In order to avoid collisions, the key space is segmented based on the given `primary_namespace`s | ||
| /// and `secondary_namespace`s. Implementations of this trait are free to handle them in different | ||
| /// ways, as long as per-namespace key uniqueness is asserted. | ||
| /// | ||
| /// Keys and namespaces are required to be valid ASCII strings in the range of | ||
| /// [`KVSTORE_NAMESPACE_KEY_ALPHABET`] and no longer than [`KVSTORE_NAMESPACE_KEY_MAX_LEN`]. Empty | ||
| /// primary namespaces and secondary namespaces (`""`) are considered valid; however, if | ||
| /// `primary_namespace` is empty, `secondary_namespace` must also be empty. This means that concerns | ||
| /// should always be separated by primary namespace first, before secondary namespaces are used. | ||
| /// While the number of primary namespaces will be relatively small and determined at compile time, | ||
| /// there may be many secondary namespaces per primary namespace. Note that per-namespace uniqueness | ||
| /// needs to also hold for keys *and* namespaces in any given namespace, i.e., conflicts between keys | ||
| /// and equally named primary or secondary namespaces must be avoided. | ||
| /// | ||
| /// **Note:** This trait extends the functionality of [`KVStore`] by adding support for | ||
| /// paginated listing of keys based on a monotonic counter or logical timestamp. This is useful | ||
| /// when dealing with a large number of keys that cannot be efficiently retrieved all at once. | ||
| /// | ||
| /// See also [`KVStore`]. | ||
| /// | ||
| /// [`KVStore`]: ldk_node::lightning::util::persist::KVStore | ||
| /// [`KVSTORE_NAMESPACE_KEY_ALPHABET`]: ldk_node::lightning::util::persist::KVSTORE_NAMESPACE_KEY_ALPHABET | ||
| /// [`KVSTORE_NAMESPACE_KEY_MAX_LEN`]: ldk_node::lightning::util::persist::KVSTORE_NAMESPACE_KEY_MAX_LEN | ||
| pub trait PaginatedKVStore { | ||
| /// Returns the data stored for the given `primary_namespace`, `secondary_namespace`, and `key`. | ||
| /// | ||
| /// Returns an [`ErrorKind::NotFound`] if the given `key` could not be found in the given | ||
| /// `primary_namespace` and `secondary_namespace`. | ||
| /// | ||
| /// [`ErrorKind::NotFound`]: io::ErrorKind::NotFound | ||
| fn read( | ||
| &self, primary_namespace: &str, secondary_namespace: &str, key: &str, | ||
| ) -> Result<Vec<u8>, io::Error>; | ||
|
|
||
| /// Persists the given data under the given `key` with an associated `time`. | ||
| /// | ||
| /// The `time` parameter is a `i64` representing a monotonic counter or logical timestamp. | ||
| /// It is used to track the order of keys for list operations. Implementations should store the | ||
| /// `time` value and use it for ordering in the `list` method. | ||
| /// | ||
| /// Will create the given `primary_namespace` and `secondary_namespace` if not already present | ||
| /// in the store. | ||
| fn write( | ||
| &self, primary_namespace: &str, secondary_namespace: &str, key: &str, time: i64, buf: &[u8], | ||
| ) -> Result<(), io::Error>; | ||
|
|
||
| /// Removes any data that had previously been persisted under the given `key`. | ||
| /// | ||
| /// If the `lazy` flag is set to `true`, the backend implementation might choose to lazily | ||
| /// remove the given `key` at some point in time after the method returns, e.g., as part of an | ||
| /// eventual batch deletion of multiple keys. As a consequence, subsequent calls to | ||
| /// [`PaginatedKVStore::list`] might include the removed key until the changes are actually persisted. | ||
| /// | ||
| /// Note that while setting the `lazy` flag reduces the I/O burden of multiple subsequent | ||
| /// `remove` calls, it also influences the atomicity guarantees as lazy `remove`s could | ||
| /// potentially get lost on crash after the method returns. Therefore, this flag should only be | ||
| /// set for `remove` operations that can be safely replayed at a later time. | ||
| /// | ||
| /// Returns successfully if no data will be stored for the given `primary_namespace`, | ||
| /// `secondary_namespace`, and `key`, independently of whether it was present before its | ||
| /// invocation or not. | ||
| fn remove( | ||
| &self, primary_namespace: &str, secondary_namespace: &str, key: &str, lazy: bool, | ||
| ) -> Result<(), io::Error>; | ||
|
|
||
| /// Returns a paginated list of keys that are stored under the given `secondary_namespace` in | ||
| /// `primary_namespace`, ordered in descending order of `time`. | ||
| /// | ||
| /// The `list` method returns the latest records first, based on the `time` associated with each key. | ||
| /// Pagination is controlled by the `next_page_token`, which is an `Option<String>` | ||
| /// used to determine the starting point for the next page of results. If `next_page_token` is `None`, | ||
| /// the listing starts from the most recent entry. The `next_page_token` in the returned | ||
| /// [`ListResponse`] can be used to fetch the next page of results. | ||
| /// | ||
| /// Implementations should ensure that keys are returned in descending order of `time` and that | ||
| /// pagination tokens are correctly managed. | ||
| /// | ||
| /// Returns an empty list if `primary_namespace` or `secondary_namespace` is unknown or if | ||
| /// there are no more keys to return. | ||
| /// | ||
| /// [`ListResponse`]: struct.ListResponse.html | ||
| fn list( | ||
| &self, primary_namespace: &str, secondary_namespace: &str, next_page_token: Option<String>, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other than needing to wait on a dependency to be released, is there any reason why pagination couldn't be optional in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pagination cannot be optional; we need to expose pagination at the interface level in There is no reason for objects that need to be fetched in a paginated manner to be retrieved using just a list, as it results in an "unbounded" request on both the database and the server, consuming unbounded resources. While pagination isn’t optional, time-based ordering can be optional depending on the objects we are listing, which we can consider. We need to override both
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
By optional, I mean you can decided to use an implementation that does not implement the extension trait since the latter could provide a default implementation using the non-paginated version. That said, we don't need to provide a naive implementation for the extension trait. We can leave it undefined to force any implementation to define it in order to avoid the problems that you mentioned.
I think there's some confusion here then. An extension trait is a separate trait that is a subtrait of another trait. So if you have pub trait PaginatedKvStore: KvStore {
fn paginated_list(
&self, primary_namespace: &str, secondary_namespace: &str, next_page_token: Option<String>,
) -> Result<ListResponse, io::Error>;
}This seems better than duplicating the entire
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As mentioned elsewhere, I agree with Jeff here in that I find going the extension trait route much preferable. At least, if I'm not currently missing something that would prohibit us from going that way. In particular if the goal is to upstream this trait and the corresponding implementations to LDK Node and eventually to LDK, I imagine it would be unnecessarily hard to reconcile the APIs at that point if we start off with two entirely separate but functionally and terminologically overlapping traits.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Extension option proposed above only works if we write all objects with 1.) it would need changes in ldk-node base sql at the minimum. Due to changes in both Also, ldk-node no longer has kvstore trait, any changes to kv_store trait require changes in rust-lightning, although it would be good-to-have, we probably don't need paginated api in rust-lightning right now. If we want to have time in write, or separate trait.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is another problem with this combined interface. Moreover, we would want paginated_writes and list to go through a separate table since we will have indexes on the time column. Essentially, these two are two different data stores. If the implementation chooses to combine them, they are free to do so, but I don't think we should complicate the interface just to avoid duplication of methods in trait. We can always implement two traits using the same base methods, instead of combining two different functionalities into one.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Confusing to whom? It's not the implementors responsibility to get anything right. The caller (i.e., whichever struct is parameterized by the trait) is responsible for using it correctly. And here the caller and the author of the trait are one and the same.
Hmm... not quite sure how you would support that in LDK Node. If anything, it would probably make more sense to decompose the API into My broader point, though, the onus is really on the user of the interface to do the right thing. How the traits are structured is less important until we talk about upstreaming. See #22 (comment).
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think it is confusing inside
I'm not sure if it is correct to assume that users will want to use the same datastore for both normal So, I think users could be better served if they have the option to configure What about the overhead of indexing on the time column? Are we okay with introducing additional overhead to index the time column for all keys instead of just the paginated ones? In my opinion, there are bigger trade-offs to consider here, like flexibility in database choice and indexing performance, than just duplication in the interface.
If the implementor of the trait wants to configure separate tables for paginated and normal writes, it isn't easy to do so.
The key point is that there can be two different implementations/schemas for the two traits (
I think, just like we configure Another option could be to just use
The trait structure adds additional requirements on implementations, database schemas, and indexing.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Discussed this offline with @G8XSU. Seems two independent traits would be ideal if we want to allow parameterizing LDK Node with two different stores, where one needs to be paginated. That would also mean we'd want two separate implementations -- one for each trait -- like @tnull What do you think?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We discussed it offline in dev-sync, concluded with going for separate traits as outlined in this PR. |
||
| ) -> Result<ListResponse, io::Error>; | ||
| } | ||
|
|
||
| /// Represents the response from a paginated `list` operation. | ||
| /// | ||
| /// Contains the list of keys and an optional `next_page_token` that can be used to retrieve the | ||
| /// next set of keys. | ||
| pub struct ListResponse { | ||
| /// A vector of keys, ordered in descending order of `time`. | ||
| pub keys: Vec<String>, | ||
|
|
||
| /// A token that can be used to retrieve the next set of keys. | ||
| pub next_page_token: Option<String>, | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| mod api; | ||
| mod io; | ||
| mod service; | ||
| mod util; | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why
timecan't be an implementation detail?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it could be handled by
implinternally, this would pose some limitations onimpl.For example, a user of this interface might already have access to a monotonic counter and thus avoid a dependency on
std::time; iiuc, std::time has limited support in Wasm.While it doesn’t matter much for
ldk-serverright now, when we upstream something similar, we might want to do this in a non-std::time-dependent fashion or atleast not at kv-store interface/impl level.Additionally, providing time as an input could allow certain types of objects to be written without requiring
time, which is mainly useful in cases where the listing doesn’t need time ordering.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it doesn't need to be directly in the implementation but could be at the database-level, for instance.
I suppose that's a fair point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do this at DB level but that is tight coupling and using very db specific features.
Doing this without time dependency in implementation requires the feature "auto-populate creation_time"
With different SQL impls it might be possible but not every DB, hence we don't want to place that limitation.
For example:
Can we do this in sqllite?
Yes,
creation_time INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))Can we do this in postgres?
Yes,
creation_time INTEGER NOT NULL DEFAULT (EXTRACT(EPOCH FROM NOW())::INT)Can we do this in mysql?
Yes,
creation_time INT NOT NULL DEFAULT UNIX_TIMESTAMP(),Can we do this in dynamoDB?
Afaik, No
Can we do this in cloudflare-kv?
Afaik, No
Can we do this in mongo-db?
Yes,
Can we do this in couch-db?
Afaik, No
Can we do this in redis?
Afaik, No
Hence, not providing this at interface level, places expectations/requirements on db-implementations that were not necessarily required. It is only relevant for wasm environments, but if we can easily avoid this, i think we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the
timeparameter in the extension trait method as defined in #22 (comment) and discussed offline.