feat(minio): Allow for proxies #1184

eddiebergman · 2022-11-24T21:33:45Z

Reference Issue

What does this PR implement/fix? Explain your changes.

The Minio client does not automatically resolve proxies as is currently being done for requests. This PR uses requests strategy to automatically resolve the correct proxy and use this with the Minio client. With the shift to .pq files which are communcated with by Minio, this should help transition away from arff with less complaints.

Previously, Minio would do repeated calls to the download server but have long timeouts and ignore openml.config settings before finally failing and throwing an error.

How should this PR be tested?

I don't know, if you have a proxy, please use it along with this following script:

# Runs fine now
python test-proxy.py

# Will hang as we disable proxies which is apparently common in curl,wget,requests
no_proxy="*" python test-proxy.py

# test-proxy.py
import openml
import shutil
from pathlib import Path

did = 2
path = Path(openml.config.get_cache_directory()) / "datasets" / str(did)
if path.exists():
    shutil.rmtree(str(path))
    
assert not path.exists()

d = openml.datasets.get_dataset(did, download_data=True)
print(d)

assert path.exists()

I'm unfamiliar with python's networking libraries and how to test proxies in general so I excluded tests.

Any other comments?

Proxies are handled without user knowledge with requests and so Minio will do so too. I think this makes sense but if you wish to make it more explicit somehow, please let me know
I left a parameter proxy: str | bool in the _download_minio_file specifically to document the behavior and the user has no way to interact with this beyond environment variables (as is requests)
I didn't think this required an example as there is non for requests and proxies, they are now alligned.
Uni-freiburg and its cluster users would really appreciate a release with these changes! 🥂

openml/_api_calls.py

PGijsbers

I'm unfamiliar with python's networking libraries and how to test proxies in general so I excluded tests.

Same here, unfortunately. A manual test at least confirms the defaults don't break when not behind a proxy 😅

Uni-freiburg and its cluster users would really appreciate a release with these changes!

I was planning a release today, let's get it in! (btw, is it correct you are not on OpenML Slack?)

openml/_api_calls.py

eddiebergman · 2022-11-25T11:44:53Z

I put in your "auto" suggestion and it's looking a little cleaner, thanks for that!

Same here, unfortunately. A manual test at least confirms the defaults don't break when not behind a proxy sweat_smile

That's good at least! I would be lying if I said I actually tried it without a proxy, I was hoping the tests would all catch that :) Brief look at the failing tests makes me think it's unrelated?

(btw, is it correct you are not on OpenML Slack?)
I'm not actually, if it's relatively active I'd be thrilled to join to know what's going on!

PGijsbers · 2022-11-25T12:57:34Z

I'm not actually, if it's relatively active I'd be thrilled to join to know what's going on!

Not always, but it can be very convenient to quickly discuss things or ask a question :) (for example, the server issue last week, or me notifying you of the new openml-python release so you can do a quick test later)

* Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * ADD: download all files from minio bucket * Add note for #1184 * Fix pre-commit issues (mypy, flake) Co-authored-by: Matthias Feurer <[email protected]>

* feat(minio): Allow for proxies * fix: Declared proxy_client as None * refactor(proxy): Change to `str | None` with "auto"

* Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * Towards downloading buckets * Download entire bucket instead of dataset file * Dont download arff, skip files already cached * Automatically unzip any downloaded archives * Make downloading the bucket optional Additionally, rename old cached files to the new filename format. * Allow users to download the full bucket when pq is already cached Otherwise the only way would be to delete the cache. * Add unit test stub * Remove redundant try/catch * Remove commented out print statement * Still download arff * ADD: download all files from minio bucket * Add note for openml#1184 * Fix pre-commit issues (mypy, flake) Co-authored-by: Matthias Feurer <[email protected]>

eddiebergman added 2 commits November 24, 2022 22:18

feat(minio): Allow for proxies

50553ef

fix: Declared proxy_client as None

389f14d

PGijsbers reviewed Nov 25, 2022

View reviewed changes

openml/_api_calls.py Show resolved Hide resolved

PGijsbers reviewed Nov 25, 2022

View reviewed changes

openml/_api_calls.py Outdated Show resolved Hide resolved

PGijsbers reviewed Nov 25, 2022

View reviewed changes

openml/_api_calls.py Outdated Show resolved Hide resolved

PGijsbers requested changes Nov 25, 2022

View reviewed changes

openml/_api_calls.py Outdated Show resolved Hide resolved

PGijsbers mentioned this pull request Nov 25, 2022

Problematic with .pq on dataset 42742 #1183

Closed

refactor(proxy): Change to str | None with "auto"

9cd3b44

PGijsbers approved these changes Nov 25, 2022

View reviewed changes

PGijsbers merged commit a909a0c into openml:develop Nov 25, 2022

PGijsbers added a commit to PGijsbers/openml-python that referenced this pull request Nov 25, 2022

Add note for openml#1184

f6f9c49

PGijsbers pushed a commit to Mirkazemi/openml-python that referenced this pull request Feb 23, 2023

feat(minio): Allow for proxies (openml#1184)

56d2faa

* feat(minio): Allow for proxies * fix: Declared proxy_client as None * refactor(proxy): Change to `str | None` with "auto"

eddiebergman mentioned this pull request Mar 14, 2023

OpenML ConnectionRefused automl/HPOBench#171

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(minio): Allow for proxies #1184

feat(minio): Allow for proxies #1184

Uh oh!

eddiebergman commented Nov 24, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PGijsbers left a comment

Uh oh!

Uh oh!

eddiebergman commented Nov 25, 2022

Uh oh!

PGijsbers commented Nov 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(minio): Allow for proxies #1184

feat(minio): Allow for proxies #1184

Uh oh!

Conversation

eddiebergman commented Nov 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this PR implement/fix? Explain your changes.

How should this PR be tested?

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PGijsbers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eddiebergman commented Nov 25, 2022

Uh oh!

PGijsbers commented Nov 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eddiebergman commented Nov 24, 2022 •

edited

Loading