-
-
Notifications
You must be signed in to change notification settings - Fork 211
Better support for passthrough and drop in sklearn extension #943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| elif serialized_type == "function": | ||
| rval = self._deserialize_function(value) | ||
| elif serialized_type == "component_reference": | ||
| elif serialized_type in ("composition_step_constant", "component_reference"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these scikit-learn names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these are magic constants used by us -> I should actually have them as magic constants at the top of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am growing more concerned with this code.
First, I am a bit worried that what looks like a minor addition requires this much change in our code base (I realize it's also in part refactoring/method extraction - but I count seven "passthrough" in the code).
Second, generally I find the (de)serialization code hard to follow. It's definitely possible to get into it (I have worked on and extended this code before), but requires quite some effort to understand what each case does (for me, anyway).
I don't have any specific requested changes (I think a refactor should be a separate PR anyway), but I'd like to hear your thoughts first. Do you share these concerns? (If not, why not?)
| def _deserialize_parameter_step_constants(self, step: Dict[str, str]) -> Dict[str, Any]: | ||
| return step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a simple return of the argument abstracted by a function?
Or why does a simple return of the argument alter the type as seen in the annotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one of the TODOs mentioned above, I'll fix that in one of my upcoming commits.
tests/test_extensions/test_sklearn_extension/test_sklearn_extension.py
Outdated
Show resolved
Hide resolved
62e3462 to
2294840
Compare
I am doing so as well.
Yes, it also took me way too much time to do so. But on the other hand, allowing strings in a place where we had only objects of type OpenMLFlow before is actually breaking with a very strong assumption.
I fully agree. I think this refactor should happend if and when OpenML moves to a new flow description. |
|
Okay, I added additional unit tests. A lot of tests can probably be unified. Before starting such an endeavor I would like to know whether everything else is fine to you. |
PGijsbers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks OK. Would appreciate cleaning up the tests.
What does this PR implement/fix? Explain your changes.
This PR fixes a bug where the pipeline and column transformer elements
'passthrough'and'drop'could be serialized into anOpenMLFlow, but not into the xml representation of a flow.How should this PR be tested?
This PR has some basic checks. If the proposed route is fine with everybody, I will add more tests.
TODOs
("drop", "passthrough")("passthrough", "drop")into a module level variable