Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions server/src/io/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
pub(crate) mod paginated_kv_store;
102 changes: 102 additions & 0 deletions server/src/io/paginated_kv_store.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
use std::io;

/// Provides an interface that allows storage and retrieval of persisted values that are associated
/// with given keys, with support for pagination with time-based ordering.
///
/// In order to avoid collisions, the key space is segmented based on the given `primary_namespace`s
/// and `secondary_namespace`s. Implementations of this trait are free to handle them in different
/// ways, as long as per-namespace key uniqueness is asserted.
///
/// Keys and namespaces are required to be valid ASCII strings in the range of
/// [`KVSTORE_NAMESPACE_KEY_ALPHABET`] and no longer than [`KVSTORE_NAMESPACE_KEY_MAX_LEN`]. Empty
/// primary namespaces and secondary namespaces (`""`) are considered valid; however, if
/// `primary_namespace` is empty, `secondary_namespace` must also be empty. This means that concerns
/// should always be separated by primary namespace first, before secondary namespaces are used.
/// While the number of primary namespaces will be relatively small and determined at compile time,
/// there may be many secondary namespaces per primary namespace. Note that per-namespace uniqueness
/// needs to also hold for keys *and* namespaces in any given namespace, i.e., conflicts between keys
/// and equally named primary or secondary namespaces must be avoided.
///
/// **Note:** This trait extends the functionality of [`KVStore`] by adding support for
/// paginated listing of keys based on a monotonic counter or logical timestamp. This is useful
/// when dealing with a large number of keys that cannot be efficiently retrieved all at once.
///
/// See also [`KVStore`].
///
/// [`KVStore`]: ldk_node::lightning::util::persist::KVStore
/// [`KVSTORE_NAMESPACE_KEY_ALPHABET`]: ldk_node::lightning::util::persist::KVSTORE_NAMESPACE_KEY_ALPHABET
/// [`KVSTORE_NAMESPACE_KEY_MAX_LEN`]: ldk_node::lightning::util::persist::KVSTORE_NAMESPACE_KEY_MAX_LEN
pub trait PaginatedKVStore {
/// Returns the data stored for the given `primary_namespace`, `secondary_namespace`, and `key`.
///
/// Returns an [`ErrorKind::NotFound`] if the given `key` could not be found in the given
/// `primary_namespace` and `secondary_namespace`.
///
/// [`ErrorKind::NotFound`]: io::ErrorKind::NotFound
fn read(
&self, primary_namespace: &str, secondary_namespace: &str, key: &str,
) -> Result<Vec<u8>, io::Error>;

/// Persists the given data under the given `key` with an associated `time`.
///
/// The `time` parameter is a `i64` representing a monotonic counter or logical timestamp.
/// It is used to track the order of keys for list operations. Implementations should store the
/// `time` value and use it for ordering in the `list` method.
///
/// Will create the given `primary_namespace` and `secondary_namespace` if not already present
/// in the store.
fn write(
&self, primary_namespace: &str, secondary_namespace: &str, key: &str, time: i64, buf: &[u8],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why time can't be an implementation detail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it could be handled by impl internally, this would pose some limitations on impl.

For example, a user of this interface might already have access to a monotonic counter and thus avoid a dependency on std::time; iiuc, std::time has limited support in Wasm.

While it doesn’t matter much for ldk-server right now, when we upstream something similar, we might want to do this in a non-std::time-dependent fashion or atleast not at kv-store interface/impl level.

Additionally, providing time as an input could allow certain types of objects to be written without requiring time, which is mainly useful in cases where the listing doesn’t need time ordering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it could be handled by impl internally, this would pose some limitations on impl.

For example, a user of this interface might already have access to a monotonic counter and thus avoid a dependency on std::time; iiuc, std::time has limited support in Wasm.

Note that it doesn't need to be directly in the implementation but could be at the database-level, for instance.

While it doesn’t matter much for ldk-server right now, when we upstream something similar, we might want to do this in a non-std::time-dependent fashion or atleast not at kv-store interface/impl level.

Additionally, providing time as an input could allow certain types of objects to be written without requiring time, which is mainly useful in cases where the listing doesn’t need time ordering.

I suppose that's a fair point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it doesn't need to be directly in the implementation but could be at the database-level, for instance.

We could do this at DB level but that is tight coupling and using very db specific features.
Doing this without time dependency in implementation requires the feature "auto-populate creation_time"
With different SQL impls it might be possible but not every DB, hence we don't want to place that limitation.

For example:

  • Can we do this in sqllite?
    Yes, creation_time INTEGER NOT NULL DEFAULT (strftime('%s', 'now'))

  • Can we do this in postgres?
    Yes, creation_time INTEGER NOT NULL DEFAULT (EXTRACT(EPOCH FROM NOW())::INT)

  • Can we do this in mysql?
    Yes, creation_time INT NOT NULL DEFAULT UNIX_TIMESTAMP(),

  • Can we do this in dynamoDB?
    Afaik, No

  • Can we do this in cloudflare-kv?
    Afaik, No

  • Can we do this in mongo-db?
    Yes,

  • Can we do this in couch-db?
    Afaik, No

  • Can we do this in redis?
    Afaik, No

Hence, not providing this at interface level, places expectations/requirements on db-implementations that were not necessarily required. It is only relevant for wasm environments, but if we can easily avoid this, i think we should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the time parameter in the extension trait method as defined in #22 (comment) and discussed offline.

) -> Result<(), io::Error>;

/// Removes any data that had previously been persisted under the given `key`.
///
/// If the `lazy` flag is set to `true`, the backend implementation might choose to lazily
/// remove the given `key` at some point in time after the method returns, e.g., as part of an
/// eventual batch deletion of multiple keys. As a consequence, subsequent calls to
/// [`PaginatedKVStore::list`] might include the removed key until the changes are actually persisted.
///
/// Note that while setting the `lazy` flag reduces the I/O burden of multiple subsequent
/// `remove` calls, it also influences the atomicity guarantees as lazy `remove`s could
/// potentially get lost on crash after the method returns. Therefore, this flag should only be
/// set for `remove` operations that can be safely replayed at a later time.
///
/// Returns successfully if no data will be stored for the given `primary_namespace`,
/// `secondary_namespace`, and `key`, independently of whether it was present before its
/// invocation or not.
fn remove(
&self, primary_namespace: &str, secondary_namespace: &str, key: &str, lazy: bool,
) -> Result<(), io::Error>;

/// Returns a paginated list of keys that are stored under the given `secondary_namespace` in
/// `primary_namespace`, ordered in descending order of `time`.
///
/// The `list` method returns the latest records first, based on the `time` associated with each key.
/// Pagination is controlled by the `next_page_token`, which is an `Option<String>`
/// used to determine the starting point for the next page of results. If `next_page_token` is `None`,
/// the listing starts from the most recent entry. The `next_page_token` in the returned
/// [`ListResponse`] can be used to fetch the next page of results.
///
/// Implementations should ensure that keys are returned in descending order of `time` and that
/// pagination tokens are correctly managed.
///
/// Returns an empty list if `primary_namespace` or `secondary_namespace` is unknown or if
/// there are no more keys to return.
///
/// [`ListResponse`]: struct.ListResponse.html
fn list(
&self, primary_namespace: &str, secondary_namespace: &str, next_page_token: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than needing to wait on a dependency to be released, is there any reason why pagination couldn't be optional in KvStore (i.e., an implementation could chose whether or not to support it)? An extension trait could add a list_paginated method with a naive provided implementation to do the filtering. Then implementations could override it with something more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pagination cannot be optional; we need to expose pagination at the interface level in ldk-server APIs. There could be 100,000 forwarded payments or payments, and we don’t want to load them all into memory.

There is no reason for objects that need to be fetched in a paginated manner to be retrieved using just a list, as it results in an "unbounded" request on both the database and the server, consuming unbounded resources.

While pagination isn’t optional, time-based ordering can be optional depending on the objects we are listing, which we can consider.

We need to override both write and list. I am open to suggestions, but I thought it might be confusing to combine them into a single trait.
It might be better to write/access objects that need pagination through a dedicated paginated interface, while those that don’t can use the normal kv_store interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pagination cannot be optional; we need to expose pagination at the interface level in ldk-server APIs. There could be 100,000 forwarded payments or payments, and we don’t want to load them all into memory.

There is no reason for objects that need to be fetched in a paginated manner to be retrieved using just a list, as it results in an "unbounded" request on both the database and the server, consuming unbounded resources.

By optional, I mean you can decided to use an implementation that does not implement the extension trait since the latter could provide a default implementation using the non-paginated version.

That said, we don't need to provide a naive implementation for the extension trait. We can leave it undefined to force any implementation to define it in order to avoid the problems that you mentioned.

While pagination isn’t optional, time-based ordering can be optional depending on the objects we are listing, which we can consider.

We need to override both write and list. I am open to suggestions, but I thought it might be confusing to combine them into a single trait. It might be better to write/access objects that need pagination through a dedicated paginated interface, while those that don’t can use the normal kv_store interface.

I think there's some confusion here then. An extension trait is a separate trait that is a subtrait of another trait. So if you have KvStore you can define PaginatedKvStore as:

pub trait PaginatedKvStore: KvStore {
	fn paginated_list(
		&self, primary_namespace: &str, secondary_namespace: &str, next_page_token: Option<String>,
	) -> Result<ListResponse, io::Error>;
}

This seems better than duplicating the entire SqliteStore implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, I agree with Jeff here in that I find going the extension trait route much preferable. At least, if I'm not currently missing something that would prohibit us from going that way.

In particular if the goal is to upstream this trait and the corresponding implementations to LDK Node and eventually to LDK, I imagine it would be unnecessarily hard to reconcile the APIs at that point if we start off with two entirely separate but functionally and terminologically overlapping traits.

Copy link
Contributor Author

@G8XSU G8XSU Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extension option proposed above only works if we write all objects with time all the time.

1.) it would need changes in ldk-node base sql at the minimum.
2.) there is time dependency in database impl. (being discussed in above comment), (if we don't want two write methods in same trait.)

Due to changes in both write and list, i proposed going with separate paginated_kv_store interface.

Also, ldk-node no longer has kvstore trait, any changes to kv_store trait require changes in rust-lightning, although it would be good-to-have, we probably don't need paginated api in rust-lightning right now.

If we want to have time in write,
We can either have an extension trait like:

pub trait PaginatedKvStore: KvStore {
        // this write method doesn't make much sense imo.
        fn paginated_write(
		&self, primary_namespace: &str, secondary_namespace: &str, key: &str, time: i64, buf: &[u8]
	) -> Result<ListResponse, io::Error>;
	
	fn paginated_list(
		&self, primary_namespace: &str, secondary_namespace: &str, next_page_token: Option<String>,
	) -> Result<ListResponse, io::Error>;
}

or separate trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another problem with this combined interface.
paginated_writes and normal write operations cannot be combined for the same object type (since normal writes don't use the time field currently).
Similarly, a paginated list operation can't be run on objects written through normal writes. I think that will make for a super confusing API with footguns everywhere.

Moreover, we would want paginated_writes and list to go through a separate table since we will have indexes on the time column.

Essentially, these two are two different data stores. If the implementation chooses to combine them, they are free to do so, but I don't think we should complicate the interface just to avoid duplication of methods in trait.

We can always implement two traits using the same base methods, instead of combining two different functionalities into one.
For ldk-node as well, I think it would make sense to support providing both KvStore and paginated_kv_store implementations separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another problem with this combined interface. paginated_writes and normal write operations cannot be combined for the same object type (since normal writes don't use the time field currently). Similarly, a paginated list operation can't be run on objects written through normal writes. I think that will make for a super confusing API with footguns everywhere.

Confusing to whom? It's not the implementors responsibility to get anything right. The caller (i.e., whichever struct is parameterized by the trait) is responsible for using it correctly. And here the caller and the author of the trait are one and the same.

Moreover, we would want paginated_writes and list to go through a separate table since we will have indexes on the time column.

Essentially, these two are two different data stores. If the implementation chooses to combine them, they are free to do so, but I don't think we should complicate the interface just to avoid duplication of methods in trait.

We can always implement two traits using the same base methods, instead of combining two different functionalities into one. For ldk-node as well, I think it would make sense to support providing both KvStore and paginated_kv_store implementations separately.

Hmm... not quite sure how you would support that in LDK Node. If anything, it would probably make more sense to decompose the API into KVRead, KVWrite, PaginatedKVRead, and PaginatedKVWrite. That way each part of the code is explicit about what the interface it needs. But when configuring LDK Node, you'd still need something that implements all the traits. So, in practice, an implementation would need to implement all those traits anyway.

My broader point, though, the onus is really on the user of the interface to do the right thing. How the traits are structured is less important until we talk about upstreaming. See #22 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing to whom? It's not the implementor's responsibility to get anything right.

I think it is confusing inside ldk-node or ldk-server and creates foot-guns, such as not combining list with paginated_write or paginated_list with list/write.

But when configuring LDK Node, you'd still need something that implements all the traits.

I'm not sure if it is correct to assume that users will want to use the same datastore for both normal kvstore and paginated_kvstore. NoSQL/DocumentDB/ObjectStorage are almost a perfect fit for kvstore, whereas SQL-based databases might be a better fit for paginated_kvstore.

So, I think users could be better served if they have the option to configure kvstore and paginated_kvstore separately. Combining them and adding more and more functionality to kvstore pushes them toward using SQL databases for everything.

What about the overhead of indexing on the time column? Are we okay with introducing additional overhead to index the time column for all keys instead of just the paginated ones?

In my opinion, there are bigger trade-offs to consider here, like flexibility in database choice and indexing performance, than just duplication in the interface.

Separate tables

If the implementor of the trait wants to configure separate tables for paginated and normal writes, it isn't easy to do so.
If there are separate tables, we would also need a paginated_read (just reads from the paginated table) and a paginated_remove (just deletes from the paginated table).

So, in practice, an implementation would need to implement all those traits anyway.

The key point is that there can be two different implementations/schemas for the two traits (KvStore and PaginatedKvStore) being used.

Not quite sure how you would support that in LDK Node.

I think, just like we configure KvStore in the builder, we would also have the option to configure paginated_kv_store.

Another option could be to just use paginated_kv_store or change the interface of the current KvStore. But this restricts database, schema and indexing options.

My broader point, though, the onus is really on the user of the interface to do the right thing. How the traits are structured is less important until we talk about upstreaming.

The trait structure adds additional requirements on implementations, database schemas, and indexing.
More requirements for trait-implementations and database not just for user(ldk-node/ldk-server).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this offline with @G8XSU. Seems two independent traits would be ideal if we want to allow parameterizing LDK Node with two different stores, where one needs to be paginated. That would also mean we'd want two separate implementations -- one for each trait -- like SqliteStore and PaginatedSqliteStore. This is because each would need to be initialized differently (e.g., creating an index, using a different db/table name, etc.). If instead we had one type implementing both traits, you could imagine constructing one that is set up to work in a non-paginated manner but passing it for use with the paginated case, causing it to break.

@tnull What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed it offline in dev-sync, concluded with going for separate traits as outlined in this PR.

) -> Result<ListResponse, io::Error>;
}

/// Represents the response from a paginated `list` operation.
///
/// Contains the list of keys and an optional `next_page_token` that can be used to retrieve the
/// next set of keys.
pub struct ListResponse {
/// A vector of keys, ordered in descending order of `time`.
pub keys: Vec<String>,

/// A token that can be used to retrieve the next set of keys.
pub next_page_token: Option<String>,
}
1 change: 1 addition & 0 deletions server/src/main.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
mod api;
mod io;
mod service;
mod util;

Expand Down