Better support for passthrough and drop in sklearn extension #943

mfeurer · 2020-08-27T17:22:22Z

What does this PR implement/fix? Explain your changes.

This PR fixes a bug where the pipeline and column transformer elements 'passthrough' and 'drop' could be serialized into an OpenMLFlow, but not into the xml representation of a flow.

How should this PR be tested?

This PR has some basic checks. If the proposed route is fine with everybody, I will add more tests.

TODOs

more tests
specific test for serialization of ("drop", "passthrough")
extract the tuple ("passthrough", "drop") into a module level variable
there's an unnecessary function right now which only returns the input value and can be dropped again
make strings like component reference etc. magic variables in the code.

PGijsbers · 2020-08-28T15:20:29Z

openml/extensions/sklearn/extension.py

                elif serialized_type == "function":
                    rval = self._deserialize_function(value)
-                elif serialized_type == "component_reference":
+                elif serialized_type in ("composition_step_constant", "component_reference"):


Are these scikit-learn names?

No, these are magic constants used by us -> I should actually have them as magic constants at the top of the file.

PGijsbers

I am growing more concerned with this code.
First, I am a bit worried that what looks like a minor addition requires this much change in our code base (I realize it's also in part refactoring/method extraction - but I count seven "passthrough" in the code).
Second, generally I find the (de)serialization code hard to follow. It's definitely possible to get into it (I have worked on and extended this code before), but requires quite some effort to understand what each case does (for me, anyway).

I don't have any specific requested changes (I think a refactor should be a separate PR anyway), but I'd like to hear your thoughts first. Do you share these concerns? (If not, why not?)

openml/extensions/sklearn/extension.py

Neeratyoy · 2020-08-29T13:04:53Z

openml/extensions/sklearn/extension.py

+    def _deserialize_parameter_step_constants(self, step: Dict[str, str]) -> Dict[str, Any]:
+        return step


Why do we need a simple return of the argument abstracted by a function?
Or why does a simple return of the argument alter the type as seen in the annotation?

That's one of the TODOs mentioned above, I'll fix that in one of my upcoming commits.

openml/extensions/sklearn/extension.py

tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py

…xml dict

mfeurer · 2020-08-31T20:18:33Z

I am growing more concerned with this code.

I am doing so as well.

First, I am a bit worried that what looks like a minor addition requires this much change in our code base (I realize it's also in part refactoring/method extraction - but I count seven "passthrough" in the code).

Yes, it also took me way too much time to do so. But on the other hand, allowing strings in a place where we had only objects of type OpenMLFlow before is actually breaking with a very strong assumption.

Second, generally I find the (de)serialization code hard to follow. It's definitely possible to get into it (I have worked on and extended this code before), but requires quite some effort to understand what each case does (for me, anyway).
I don't have any specific requested changes (I think a refactor should be a separate PR anyway), but I'd like to hear your thoughts first. Do you share these concerns? (If not, why not?)

I fully agree. I think this refactor should happend if and when OpenML moves to a new flow description.

mfeurer · 2020-09-01T09:15:35Z

Okay, I added additional unit tests. A lot of tests can probably be unified. Before starting such an endeavor I would like to know whether everything else is fine to you.

PGijsbers

I think it looks OK. Would appreciate cleaning up the tests.

openml/extensions/sklearn/extension.py

mfeurer requested review from Neeratyoy, PGijsbers and amueller August 27, 2020 17:22

PGijsbers reviewed Aug 28, 2020

View reviewed changes

openml/extensions/sklearn/extension.py Show resolved Hide resolved

openml/extensions/sklearn/extension.py Show resolved Hide resolved

Neeratyoy reviewed Aug 31, 2020

View reviewed changes

tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py Outdated Show resolved Hide resolved

support passthrough and drop in sklearn extension when serialized to …

2294840

…xml dict

mfeurer force-pushed the support_passthrough branch from 62e3462 to 2294840 Compare August 31, 2020 18:28

mfeurer added 2 commits August 31, 2020 21:30

make test work with sklearn==0.21

9cf5d3a

improve PR

8fc58fb

Add additional unit tests

d9c96a7

fix test

1546a2f

PGijsbers approved these changes Sep 1, 2020

View reviewed changes

openml/extensions/sklearn/extension.py Outdated Show resolved Hide resolved

incorporate feedback and generalize unit tests

f710d47

Neeratyoy approved these changes Sep 2, 2020

View reviewed changes

mfeurer merged commit 3d85fa7 into develop Sep 2, 2020

mfeurer deleted the support_passthrough branch September 2, 2020 11:17

		def _deserialize_parameter_step_constants(self, step: Dict[str, str]) -> Dict[str, Any]:
		return step

Uh oh!

Better support for passthrough and drop in sklearn extension #943

Better support for passthrough and drop in sklearn extension #943

Uh oh!

Conversation

mfeurer commented Aug 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR implement/fix? Explain your changes.

How should this PR be tested?

Uh oh!

PGijsbers Aug 28, 2020

Choose a reason for hiding this comment

Uh oh!

mfeurer Aug 31, 2020

Choose a reason for hiding this comment

Uh oh!

PGijsbers left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Neeratyoy Aug 29, 2020

Choose a reason for hiding this comment

Uh oh!

mfeurer Aug 31, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mfeurer commented Aug 31, 2020

Uh oh!

mfeurer commented Sep 1, 2020

Uh oh!

PGijsbers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mfeurer commented Aug 27, 2020 •

edited

Loading

PGijsbers left a comment •

edited

Loading