-
Notifications
You must be signed in to change notification settings - Fork 36
Allow to give a save path directly #821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Pinging @utf @gpetretto @davidwaroquiers. Any opinion on this? Thanks. |
|
Hi @FabiPi3 I think it's interesting to have such an option. One thing that comes to mind is that maybe we could use mongo's dot notation instead of a list ? In your example, the decorator would be:
I haven't looked at the implementation but just commented from the user's perspective at this stage. |
gpetretto
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @FabiPi3,
this seems like an interesting option. I think I would have had a use for this in the past.
I did not make a full review of the code, although it is quite small, so I don't see big problems. I just left a small comment.
As for @davidwaroquiers comment. Indeed it may have been nice to use use the dot notation instead of a list. However there may be two issues:
- MongoDB actually allows to have dots in the keyword names, although discouradged. How would you tell those apart?
- a "location" coming from
find_keycan contain an integer in case the path to the location is inside a list. But in the dot notation, how to tell apart an integer from a key that is instead a string including an integer?
These are both boarderline cases, but since this is to provide a fine control on the data to be stored, I would suggest to stick to the list.
Cocerning your two questions:
- On one hand I think that in principle everybody would want to focus on something that is inside the
outputkey. But on the other hand the standard behaviour can move to the additional store any part of the document, included the root keywords of theJobStoreDocuments(e.g.metadata). I am not sure what would be the best option here, but I would leave things as you implemented (with the need to includeoutput), just to be consistent between the standard and the new option. - You mean allowing to pass a list to
load(or a list of lists) and this will only load the specified data. This may indeed be interesting, but maybe it is a bit more involved to implement?
src/jobflow/core/store.py
Outdated
| for store_name, store_save in save_keys.items(): | ||
| for save_key in store_save: | ||
| locations.extend(find_key(doc, save_key, include_end=True)) | ||
| if isinstance(save_key, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe here also match if it is a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
Thanks for your feedback. Sounds like there is no reason to not merge it. |
I have a use case where I propagate a large dictionary with many sub-dictionaries and many keys to the output of a
job. I don't have full control over this dictionary as it is part of the user input. I use the addtional stores feature to store large parts of my output into additional stores. Now it could happen by chance that one of my additional store keys matches with one of the keys in the user-provided dictionary. Than this would be put into the additional store as well, which is not what I want.A simple example showing this would be:
You will see two
blob_uuidsin the output. But I only want a single one.The option to do this I propose is to give the full path directly as a list:
Now only the given path is put into the additional store and the other one not.
TODO
outputpart in the list be omitted and prepended automatically?loadprocedure.