Make OpenMLTraceIteration a dataclass #1201

PGijsbers · 2023-02-12T21:21:26Z

This PR changes OpenMLTraceIteration into a dataclass, which makes the code neater and comes with a nice default repr.

Someone recently remarked that the OpenMLTrace iteration representation was cryptic and hard to miss.
One example (produced by MWE below, shortened for convenience):

OrderedDict([((0, 0, 0), [(0,0,0): 0.615341 (False)]),
             ...
             ((0, 9, 4), [(0,9,4): 0.798329 (True)])])
>>>

In a large dump, it's easy to miss at first take that the "[(0,9,4): 0.798329 (True)]" parts represent custom OpenML objects.
Even when you spot that, it's not immediately obvious what the provided information is and it misses some of the information for the trace (e.g., parameter values).
This PR will change the corresponding output to:

new:

OrderedDict([((0, 0, 0),
              OpenMLTraceIteration(repeat=0,
                                   fold=0,
                                   iteration=0,
                                   evaluation=0.6153406132170588,
                                   selected=False,
                                   setup_string=None,
                                   parameters=OrderedDict([('parameter_max_depth',
                                                            '1')]))),
	     ...,	
             ((0, 9, 4),
              OpenMLTraceIteration(repeat=0,
                                   fold=9,
                                   iteration=4,
                                   evaluation=0.7978332002449244,
                                   selected=True,
                                   setup_string=None,
                                   parameters=OrderedDict([('parameter_max_depth',
                                                            'null')])))])

Both outputs were generated by following script:

import openml
import pprint

task = openml.tasks.get_task(167119)

from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier

pipe = GridSearchCV(
    estimator=DecisionTreeClassifier(),
    param_grid={
        'max_depth': [1,2,3,4,None],
    },
)

r = openml.runs.run_model_on_task(
    pipe, task, upload_flow=False, avoid_duplicate_runs=False
)

pprint.pprint(r.trace.trace_iterations)

It provides a better repr and is less verbose.

Make OpenMLTraceIteration a dataclass

83c87fa

It provides a better repr and is less verbose.

PGijsbers requested a review from mfeurer February 20, 2023 16:02

mfeurer approved these changes Feb 24, 2023

View reviewed changes

mfeurer merged commit c590b3a into develop Feb 24, 2023

mfeurer deleted the trace branch February 24, 2023 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make OpenMLTraceIteration a dataclass #1201

Make OpenMLTraceIteration a dataclass #1201

Uh oh!

PGijsbers commented Feb 12, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Make OpenMLTraceIteration a dataclass #1201

Make OpenMLTraceIteration a dataclass #1201

Uh oh!

Conversation

PGijsbers commented Feb 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PGijsbers commented Feb 12, 2023 •

edited

Loading