Skip to content

Conversation

@mfeurer
Copy link
Collaborator

@mfeurer mfeurer commented Aug 27, 2020

What does this PR implement/fix? Explain your changes.

This PR fixes a bug where the pipeline and column transformer elements 'passthrough' and 'drop' could be serialized into an OpenMLFlow, but not into the xml representation of a flow.

How should this PR be tested?

This PR has some basic checks. If the proposed route is fine with everybody, I will add more tests.

TODOs

  • more tests
  • specific test for serialization of ("drop", "passthrough")
  • extract the tuple ("passthrough", "drop") into a module level variable
  • there's an unnecessary function right now which only returns the input value and can be dropped again
  • make strings like component reference etc. magic variables in the code.

elif serialized_type == "function":
rval = self._deserialize_function(value)
elif serialized_type == "component_reference":
elif serialized_type in ("composition_step_constant", "component_reference"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these scikit-learn names?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, these are magic constants used by us -> I should actually have them as magic constants at the top of the file.

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am growing more concerned with this code.
First, I am a bit worried that what looks like a minor addition requires this much change in our code base (I realize it's also in part refactoring/method extraction - but I count seven "passthrough" in the code).
Second, generally I find the (de)serialization code hard to follow. It's definitely possible to get into it (I have worked on and extended this code before), but requires quite some effort to understand what each case does (for me, anyway).

I don't have any specific requested changes (I think a refactor should be a separate PR anyway), but I'd like to hear your thoughts first. Do you share these concerns? (If not, why not?)

Comment on lines 1247 to 1248
def _deserialize_parameter_step_constants(self, step: Dict[str, str]) -> Dict[str, Any]:
return step
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a simple return of the argument abstracted by a function?
Or why does a simple return of the argument alter the type as seen in the annotation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's one of the TODOs mentioned above, I'll fix that in one of my upcoming commits.

@mfeurer mfeurer force-pushed the support_passthrough branch from 62e3462 to 2294840 Compare August 31, 2020 18:28
@mfeurer
Copy link
Collaborator Author

mfeurer commented Aug 31, 2020

I am growing more concerned with this code.

I am doing so as well.

First, I am a bit worried that what looks like a minor addition requires this much change in our code base (I realize it's also in part refactoring/method extraction - but I count seven "passthrough" in the code).

Yes, it also took me way too much time to do so. But on the other hand, allowing strings in a place where we had only objects of type OpenMLFlow before is actually breaking with a very strong assumption.

Second, generally I find the (de)serialization code hard to follow. It's definitely possible to get into it (I have worked on and extended this code before), but requires quite some effort to understand what each case does (for me, anyway).
I don't have any specific requested changes (I think a refactor should be a separate PR anyway), but I'd like to hear your thoughts first. Do you share these concerns? (If not, why not?)

I fully agree. I think this refactor should happend if and when OpenML moves to a new flow description.

@mfeurer
Copy link
Collaborator Author

mfeurer commented Sep 1, 2020

Okay, I added additional unit tests. A lot of tests can probably be unified. Before starting such an endeavor I would like to know whether everything else is fine to you.

Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks OK. Would appreciate cleaning up the tests.

@mfeurer mfeurer merged commit 3d85fa7 into develop Sep 2, 2020
@mfeurer mfeurer deleted the support_passthrough branch September 2, 2020 11:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants