Skip to content

Conversation

@mfeurer
Copy link
Collaborator

@mfeurer mfeurer commented Oct 28, 2020

Reference Issue

#908

What does this PR implement/fix? Explain your changes.

This PR changes the loading behavior of loading dataset qualities and dataset features. They are now loaded inside the OpenMLDataset class. Furthermore, when loaded first, the parsed XML is saved as a pickle, which is loaded upon future invocations.

How should this PR be tested?

New unittests in test/test_datasets/test_dataset.py

Any other comments?

This PR improves the loading speed of datasets. Obviously, the files which are cached as pickle are only large if the datasets have a lot of features. The most drastic improvement I could observe was for the dataset dorothea, where the loading time reduced from ~13s to ~0.2s.

@mfeurer mfeurer requested a review from PGijsbers October 28, 2020 20:24
Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes, but also the unit tests fail so need to be fixed.

@PGijsbers
Copy link
Collaborator

Can merge if CI passes (at least some of them, to make sure the merge conflict was resolved correctly 😓 )

@PGijsbers PGijsbers merged commit 560e952 into develop Nov 3, 2020
@PGijsbers PGijsbers deleted the add_908 branch November 3, 2020 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants