-
-
Notifications
You must be signed in to change notification settings - Fork 211
Description
I want to register a flow referring to the setup we used to conduct the automl benchmark presented at the ICML workshop this year. We want to do this to more easily share results through OpenML. I want to run this by you all.
Now, I know that creating your own flow in openml-python is discouraged ('Flows should not be generated manually'), but for this scenario I don't really see a better way? Other than using e.g. openml-r.
I want to describe that we used the code at our Github repo, and a specific tag, and provide a url to the download for the code. I figured to tag the used AutoML tool (in this case auto-sklearn).
There is no direct way to instantiate the script with the right settings currently, but flow below should help/be enough to configure a reproducible run. Again, the flow is created mainly to link and share results, and provide the best possible pointers for reproducing the flow.
Sketching out how I think the flow should look like:
auto_sklearn_flow = openml.flows.get_flow(15275) # auto-sklearn 0.5.1
amlb_flow = openml.flows.OpenMLFlow(
name='automlbenchmark_autosklearn',
description='Auto-sklearn as set up by the AutoML Benchmark',
external_version='amlb==0.9',
parameters=OrderedDict(
time='240',
memory='32',
cores='8'
),
parameters_meta_info=OrderedDict(
time=dict(data_type='int', description='time in minutes'),
memory=dict(data_type='int', description='memory in gigabytes'),
cores=dict(data_type='int', description='number of available cores')
),
language='English',
components=OrderedDict(automl_tool=auto_sklearn_flow),
)I would still need to find a place for the following information:
source: https://github.com/openml/automlbenchmark/releases/tag/v0.9
subfolder of particular interest: /frameworks/autosklearn
I do see url fields in OpenMLFlow.__init__ but they are set by the server.
I think there are a couple of questions here:
- Do you think I should even try doing this through the Python API?
And if so,
- Should I create only one flow for the entire benchmark, and take the 'automl tool' as a parameter? I think it's more generic, but might make the link to the automl tool's flow harder to find.
- Where to include the code-url.
- Any other things I need to know?