Skip to content

Conversation

@PGijsbers
Copy link
Collaborator

@PGijsbers PGijsbers commented Feb 12, 2023

This PR changes OpenMLTraceIteration into a dataclass, which makes the code neater and comes with a nice default repr.


Someone recently remarked that the OpenMLTrace iteration representation was cryptic and hard to miss.
One example (produced by MWE below, shortened for convenience):

OrderedDict([((0, 0, 0), [(0,0,0): 0.615341 (False)]),
             ...
             ((0, 9, 4), [(0,9,4): 0.798329 (True)])])
>>> 

In a large dump, it's easy to miss at first take that the "[(0,9,4): 0.798329 (True)]" parts represent custom OpenML objects.
Even when you spot that, it's not immediately obvious what the provided information is and it misses some of the information for the trace (e.g., parameter values).
This PR will change the corresponding output to:

new:

OrderedDict([((0, 0, 0),
              OpenMLTraceIteration(repeat=0,
                                   fold=0,
                                   iteration=0,
                                   evaluation=0.6153406132170588,
                                   selected=False,
                                   setup_string=None,
                                   parameters=OrderedDict([('parameter_max_depth',
                                                            '1')]))),
	     ...,	
             ((0, 9, 4),
              OpenMLTraceIteration(repeat=0,
                                   fold=9,
                                   iteration=4,
                                   evaluation=0.7978332002449244,
                                   selected=True,
                                   setup_string=None,
                                   parameters=OrderedDict([('parameter_max_depth',
                                                            'null')])))])

Both outputs were generated by following script:

import openml
import pprint

task = openml.tasks.get_task(167119)

from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier

pipe = GridSearchCV(
    estimator=DecisionTreeClassifier(),
    param_grid={
        'max_depth': [1,2,3,4,None],
    },
)

r = openml.runs.run_model_on_task(
    pipe, task, upload_flow=False, avoid_duplicate_runs=False
)

pprint.pprint(r.trace.trace_iterations)

It provides a better repr and is less verbose.
@PGijsbers PGijsbers requested a review from mfeurer February 20, 2023 16:02
@mfeurer mfeurer merged commit c590b3a into develop Feb 24, 2023
@mfeurer mfeurer deleted the trace branch February 24, 2023 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants