-
-
Notifications
You must be signed in to change notification settings - Fork 211
feat(minio): Allow for proxies #1184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unfamiliar with python's networking libraries and how to test proxies in general so I excluded tests.
Same here, unfortunately. A manual test at least confirms the defaults don't break when not behind a proxy 😅
Uni-freiburg and its cluster users would really appreciate a release with these changes!
I was planning a release today, let's get it in! (btw, is it correct you are not on OpenML Slack?)
|
I put in your
That's good at least! I would be lying if I said I actually tried it without a proxy, I was hoping the tests would all catch that :) Brief look at the failing tests makes me think it's unrelated?
|
Not always, but it can be very convenient to quickly discuss things or ask a question :) (for example, the server issue last week, or me notifying you of the new openml-python release so you can do a quick test later) |
* Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * ADD: download all files from minio bucket * Add note for #1184 * Fix pre-commit issues (mypy, flake) Co-authored-by: Matthias Feurer <[email protected]>
* feat(minio): Allow for proxies * fix: Declared proxy_client as None * refactor(proxy): Change to `str | None` with "auto"
* Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * ADD: download all files from minio bucket * Add note for openml#1184 * Fix pre-commit issues (mypy, flake) Co-authored-by: Matthias Feurer <[email protected]>
Reference Issue
Fixes #1159
What does this PR implement/fix? Explain your changes.
The
Minioclient does not automatically resolve proxies as is currently being done forrequests. This PR usesrequestsstrategy to automatically resolve the correct proxy and use this with theMinioclient. With the shift to.pqfiles which are communcated with byMinio, this should help transition away from arff with less complaints.Previously,
Miniowould do repeated calls to the download server but have long timeouts and ignoreopenml.configsettings before finally failing and throwing an error.How should this PR be tested?
I don't know, if you have a proxy, please use it along with this following script:
I'm unfamiliar with python's networking libraries and how to test proxies in general so I excluded tests.
Any other comments?
proxy: str | boolin the_download_minio_filespecifically to document the behavior and the user has no way to interact with this beyond environment variables (as is requests)