Additional fixes to PR 777 #967

Neeratyoy · 2020-10-23T14:42:33Z

Reference Issue

PR #777.

What does this PR implement/fix? Explain your changes.

Makes the changes/improvements suggested by @PGijsbers that were not addressed prior to merge.

PGijsbers · 2020-10-23T15:01:48Z

Checks manually cancelled.

Neeratyoy · 2020-10-26T14:37:49Z

@PGijsbers wrt to why an experimental HistGradientBoostingClassifier is used in this example:

To handle mixed-type data frames with missing values, we need to create a transformation which imputes these missing values for both categorical and continuous types. Under the current OpenML setup, we cannot create a flow with two instances of the same class, a SimpleImputer in this case. So far, for various unit tests, we have been handling it via an alias such as this.

Creating an alias class within the testing script has issues while serializing since it has got no __version__ attribute. For our test cases, it works probably since it is imported from within openml and the __version__ is shared possibly.
To prevent importing from openml.testing in a documentation example, it was decided to bypass the need of a second imputer. @mfeurer suggested the HistGradientBoostingClassifier since it can handle numeric missing values implicitly. Therefore, for this example, a passthrough works.

PGijsbers · 2020-10-26T16:00:31Z

That makes sense (even though it is a bit unfortunate). However, why not consider creating the example on a task which does not have two types of missing values? While all examples implicitly depend on some version(s) of scikit-learn, it feels using an experimental feature has exceptionally narrow support. Old versions (<=0.20) won't have this, future versions won't have this.

openml/extensions/sklearn/extension.py

Neeratyoy · 2020-10-26T17:47:06Z

However, why not consider creating the example on a task which does not have two types of missing values?

Yeah I was wondering why that was not done here but then I think we need to change the design of the example.

~~At this point, it samples 3 tasks randomly from the suite. Do we then fix this task list? If so, do we provide a justification?~~

Either way, I shall try and find 3 tasks where missing values if present is restricted to one of the types. We can then re-iterate and replace the HistGradientBoostingClassifier with a more stable classifier.

On further checking, this very task is already the case where missing values exist for categorical columns but not numeric columns. Replacing the experimental model with an RF expectedly worked fine. I pushed that version since your concern with this example being compatible across versions is an apt assessment.

@mfeurer and I were earlier trying to come up with an example that allows the generic case of handling missing values across types. Given the constraints of the setup, the pipeline with the experimental model was the only way out we could think of.

mfeurer

Looks good to me. I'll approve once I see that the tests are green.

examples/30_extended/run_setup_tutorial.py

examples/30_extended/study_tutorial.py

Neeratyoy added 2 commits October 23, 2020 16:38

Initial changes

0fb6230

Deleting TODO that will addressed by #968

8fdde8a

Neeratyoy added 2 commits October 23, 2020 17:04

[skip ci] removing redundant imports

e6e4bbd

[skip ci] Simplifying flow to generate prediction probablities

b6b9f49

Neeratyoy marked this pull request as ready for review October 26, 2020 14:38

Neeratyoy mentioned this pull request Oct 26, 2020

Dataframe run on task #777

Merged

Triggering unit tests

395985f

PGijsbers requested changes Oct 26, 2020

View reviewed changes

openml/extensions/sklearn/extension.py Outdated Show resolved Hide resolved

openml/extensions/sklearn/extension.py Outdated Show resolved Hide resolved

Fixing mypy and flake issues

06c2e6a

[skip ci] Replacing HistGradientBoostingClassifier

d43cd1d

Neeratyoy requested review from PGijsbers and mfeurer October 27, 2020 00:33

mfeurer reviewed Oct 27, 2020

View reviewed changes

examples/30_extended/run_setup_tutorial.py Outdated Show resolved Hide resolved

examples/30_extended/study_tutorial.py Outdated Show resolved Hide resolved

Simplifying examples

2df584f

PGijsbers reviewed Oct 27, 2020

View reviewed changes

examples/30_extended/study_tutorial.py Show resolved Hide resolved

Minor typo fix

534e43e

PGijsbers approved these changes Oct 29, 2020

View reviewed changes

PGijsbers merged commit 4923e5b into develop Oct 29, 2020

PGijsbers deleted the fix_pr777 branch October 29, 2020 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Additional fixes to PR 777 #967

Additional fixes to PR 777 #967

Uh oh!

Neeratyoy commented Oct 23, 2020

Uh oh!

PGijsbers commented Oct 23, 2020

Uh oh!

Neeratyoy commented Oct 26, 2020 •

edited

Loading

Uh oh!

PGijsbers commented Oct 26, 2020

Uh oh!

Uh oh!

Uh oh!

Neeratyoy commented Oct 26, 2020 •

edited

Loading

Uh oh!

mfeurer left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Additional fixes to PR 777 #967

Additional fixes to PR 777 #967

Uh oh!

Conversation

Neeratyoy commented Oct 23, 2020

Reference Issue

What does this PR implement/fix? Explain your changes.

Uh oh!

PGijsbers commented Oct 23, 2020

Uh oh!

Neeratyoy commented Oct 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PGijsbers commented Oct 26, 2020

Uh oh!

Uh oh!

Uh oh!

Neeratyoy commented Oct 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Neeratyoy commented Oct 26, 2020 •

edited

Loading

Neeratyoy commented Oct 26, 2020 •

edited

Loading