Fix/sklearn test compatibility #1340

PGijsbers · 2024-07-03T09:39:15Z

This PR updates tests to be compatible with newer versions of scikit-learn. In the process, fixes some bugs and adds numpy 2.0 support. I updated the CI matrix to include newer versions of scikit-learn. For the most part, openml-python works fine on newer versions, it was just that the unit tests themselves had hard dependencies on specifics. I extended/updated in the most cases, but in a few it was a little unreasonable (needs to be refactored to make it less sensitive to small scikit-learn api changes). I already spent quite a bit of time on this, so I hope that the proposed changes are good enough for now, so we can work on a 0.15.0 release.

The docs fail in the check-links step. I propose to fix this in a separate PR.

It is unclear how a condition where the test is supposed to pass is created. Even after running the test suite 2-3 times, it does not yet seem to pass.

There are some minor changes to the docstrings. I do not know that it is useful to keep testing it this way, so for now I will disable the test on newer versions.

The loss has been renamed. The performance of the model also seems to have changed slightly for the same seed. So I decided to compare with the lower fidelity that was already used on Windows systems.

PGijsbers · 2024-07-03T10:16:10Z

edit: no longer relevant

Test failures seem to be caused by exceeding the 10 minute time limit we have set. I am not sure why these tests need as long, but it looks like they fail and are repeated. It's likely their failure is deterministic however. So I will go ahead and disable retries so the test does a hard fail instead with a more meaningful error message (hopefully).

codecov-commenter · 2024-07-03T11:08:12Z

Codecov Report

Attention: Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 83.95%. Comparing base (923b49d) to head (2c161e4).

Files	Patch %	Lines
openml/extensions/sklearn/extension.py	60.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           develop    #1340       +/-   ##
============================================
+ Coverage    32.94%   83.95%   +51.00%     
============================================
  Files           38       38               
  Lines         5254     5259        +5     
============================================
+ Hits          1731     4415     +2684     
+ Misses        3523      844     -2679

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

PGijsbers · 2024-07-03T12:57:57Z

Tests are incredibly slow, I am not sure why. This also happens locally, so it's not just the runners. I suspect it's the server.

edit: it looks like the culprit is mostly the production server :( (most (all?) of these long tests use prod)

Numpy2.0 cleaned up their namespace.

PGijsbers · 2024-07-03T14:09:19Z

edit: message outdated, pr now fully supports numpy 2.0.

old message:

On Numpy 2.0 support/scikit-learn support (from #1341):

Numpy2.0 cleaned up their namespace, so we have to alias for the new version.
The problem is that this is not covered in CI (tested versions do not support numpy 2.0), and in general tests need to be updated to support more recent scikit-learn versions that support 2.0 (1.5, maybe 1.4?). I don't know if I really have the time for that right now. Options:

Put a pin in the dependencies

Accept this as a best-effort, and hope that the tests that don't fail due to scikit-learn incompatibility sufficiently cover the code to raise any numpy1/2 issues

Wait until I find time to make tests support scikit-learn 1.5

But I am currently actually leaning towards putting in a bit of extra time to make sure we can test on later scikit-learn versions.

There is a breaking change to the way 'mode' works, that breaks scikit-learn internals.

It seems to me that run.evaluations is set only when the run is fetched. Whether it has evaluations depends on server state. So if the server has resolved the traces between the initial fetch and the trace-check, you could be checking len(run.evaluations) where evaluations is None.

Scikit-learn or numpy changed the typing of the parameters (seen in a masked array, not sure if also outside of that). Convert these values back to Python builtins.

PGijsbers · 2024-07-04T10:35:54Z

openml/runs/functions.py

            )
-
-        if upload_flow and flow_id is None:
+        if upload_flow and flow_id is False:


flow_exists does not return None but False when a flow does not exist.
I find flow_id is False easier to parse than not flow_id since flow_id can be both integer or boolean. While both are correct, the check is clearer to me this way.

This bug would fail a unit test only the first time around, unit tests would later upload the flow anyway which means the second time this path is no longer taken and tests can pass. Because there are other tests which update server state that need to be ran a few times, this error was never caught as it was just in the initial sea of reds after a server reset.

PGijsbers · 2024-07-04T10:40:43Z

.github/workflows/test.yml

    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
-        python-version: ["3.8"]


I rewrote the matrix to be effectively the same, but also testing each 1.x release.
The only difference is that the default is now 3.9, since that is supported for all scikit-learn versions (except 0.23) so it makes things easier.

PGijsbers · 2024-07-04T10:41:41Z

.github/workflows/test.yml

    - name: Install numpy for Python 3.8
      # Python 3.8 & scikit-learn<0.24 requires numpy<=1.23.5
-      if: ${{ matrix.python-version == '3.8' && contains(fromJSON('["0.23.1", "0.22.2", "0.21.2"]'), matrix.scikit-learn) }}
+      if: ${{ matrix.python-version == '3.8' && matrix.scikit-learn == '0.23.1' }}


older versions were (no longer) included in the run matrix anyway.

We don't want to serialize as the value np.nan, we want to include the nan directly. It is an indication that the parameter was left unset.

PGijsbers · 2024-07-04T12:55:41Z

tests/test_runs/test_run_functions.py

                time.sleep(10)
                continue

+            run = openml.runs.get_run(run_id, ignore_cache=True)


Sometimes tests fail in with the old code because run.evaluations is None. But I am not able to reproduce this locally. However, the fails do seem fewer now that I move where the run object is loaded (which makes sense, initially there is a race condition where perhaps the trace isn't yet processed in get get_run call, but is by the time the get_run_trace is called). Additionally, I simply changed the check from a length check to a None check. As far as I can tell, evaluation should be a non-empty dictionary or None, so the length check doesn't make a lot of sense (probably historically the behavior was different). I kept in an assert with the old check just to make sure my assumption is correct (and we get an error if it isn't).

The default behavior if no evaluation is present is for it to be None. So it makes sense to check for that instead. As far as I can tell, run.evaluations should always contain some items if it is not None. But I added an assert just in case.

PGijsbers · 2024-07-04T16:49:33Z

@LennartPurucker I pinged you for a review since at this point I am unsure what to do. The work is done, kind of. Sometimes the tests still fail sporadically because the pytest workers get killed on GitHub actions. I have not had this happen locally. It's also not consistent - they can also pass. At this point, there are already large improvements to the tests and CI. It's probably already worth merging. From there, I can work on both new features/updates as well as debug, but then at least that can be in parallel and be discussed in separate smaller PRs. What do you think?

edit: I suspect this may also be due to the timeout, though previously I have had clearer messages when the timeout functionality was triggered. I'll remove the timeouts so we can compare CI results.

edit2: The one failing test (sporadically) is due to an internal timeout now. I think it's fairly safe to say that this has to be a server side issue, with the same test passing on other instances.

I suspect they "crash" workers. This of course introduces the risk of hanging processes... But I cannot reproduce the issue locally.

LennartPurucker

LGTM!

I agree and think we should merge this. The changes are all reasonable and important improvements.

PGijsbers added 7 commits July 2, 2024 15:23

Update 'sparse' parameter for OHE for sklearn >= 1.4

37cd702

Add compatability or skips for sklearn >= 1.4

2343203

Change 'auto' to 'sqrt' for sklearn>1.3 as 'auto' is deprecated

667529e

Skip flaky test

7b088c4

It is unclear how a condition where the test is supposed to pass is created. Even after running the test suite 2-3 times, it does not yet seem to pass.

Fix typo

875a05a

Ignore description comparison for newer scikit-learn

87cf0b3

There are some minor changes to the docstrings. I do not know that it is useful to keep testing it this way, so for now I will disable the test on newer versions.

Adjust for scikit-learn 1.3

7b826e0

The loss has been renamed. The performance of the model also seems to have changed slightly for the same seed. So I decided to compare with the lower fidelity that was already used on Windows systems.

PGijsbers added the testing label Jul 3, 2024

Remove timeout and reruns to better investigate CI failures

280a972

Fix typo in parametername

72a8765

PGijsbers requested review from LennartPurucker and removed request for LennartPurucker July 3, 2024 13:44

PGijsbers added 3 commits July 3, 2024 15:50

Add jobs for more recent scikit-learns

363724a

Expand the matrix with all scikit-learn 1.x versions

d664a34

Fix for numpy2.0 compatibility (#1341)

9af2f62

Numpy2.0 cleaned up their namespace.

PGijsbers added 11 commits July 3, 2024 16:22

Rewrite matrix and update numpy compatibility

5d1da88

Move comment in-line

9bd7b2f

Stringify name of new step to see if that prevented the action

7ce5b89

Fix unspecified os for included jobs

91f6dee

Fix typo in version pinning for numpy

670a76b

Fix version specification for sklearn skips

412a193

Output final list of installed packages for debugging purposes

35206bb

Cap scipy version for older versions of scikit-learn

f19897e

There is a breaking change to the way 'mode' works, that breaks scikit-learn internals.

Update parameter base_estimator to estimator for sklearn>=1.4

dd11f5d

Account for changes to sklearn interface in 1.4 and 1.5

9372054

Non-strict reinstantiation requires different scikit-learn version

72a2fb1

PGijsbers added 6 commits July 4, 2024 11:19

Parameters were already changed in 1.4

6830681

Use latest patch version of each minor release

a2e7022

Convert numpy types back to builtin types

828a7a4

Scikit-learn or numpy changed the typing of the parameters (seen in a masked array, not sure if also outside of that). Convert these values back to Python builtins.

Specify versions with * instead to allow for specific patches

e98019c

Flow_exists does not return None but False is the flow does not exist

c7f93c8

PGijsbers commented Jul 4, 2024

View reviewed changes

Update new version definitions also installation step

68481cf

PGijsbers commented Jul 4, 2024

View reviewed changes

Fix bug introduced in refactoring for np.generic support

a6b5ddd

We don't want to serialize as the value np.nan, we want to include the nan directly. It is an indication that the parameter was left unset.

PGijsbers commented Jul 4, 2024

View reviewed changes

PGijsbers added 3 commits July 4, 2024 15:06

Add back the single-test timeout of 600s

9e8217f

[skip ci] Add note to changelog

eda2b23

PGijsbers requested a review from LennartPurucker July 4, 2024 16:45

Remove timeouts again

2c161e4

I suspect they "crash" workers. This of course introduces the risk of hanging processes... But I cannot reproduce the issue locally.

LennartPurucker approved these changes Jul 5, 2024

View reviewed changes

PGijsbers merged commit 532be7b into develop Jul 5, 2024

PGijsbers deleted the fix/sklearn-test-compatibility branch July 5, 2024 12:56

PGijsbers mentioned this pull request Jul 11, 2024

Inspect compatibility with Numpy 2.0 #1327

Closed

PGijsbers added this to the 0.15.0 milestone Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix/sklearn test compatibility #1340

Fix/sklearn test compatibility #1340

Uh oh!

PGijsbers commented Jul 3, 2024 •

edited

Loading

Uh oh!

PGijsbers commented Jul 3, 2024 •

edited

Loading

Uh oh!

codecov-commenter commented Jul 3, 2024 •

edited

Loading

Uh oh!

PGijsbers commented Jul 3, 2024 •

edited

Loading

Uh oh!

PGijsbers commented Jul 3, 2024 •

edited

Loading

Uh oh!

PGijsbers Jul 4, 2024

Uh oh!

PGijsbers Jul 4, 2024

Uh oh!

PGijsbers Jul 4, 2024

Uh oh!

PGijsbers Jul 4, 2024

Uh oh!

PGijsbers Jul 4, 2024 •

edited

Loading

Uh oh!

PGijsbers commented Jul 4, 2024 •

edited

Loading

Uh oh!

LennartPurucker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Fix/sklearn test compatibility #1340

Fix/sklearn test compatibility #1340

Uh oh!

Conversation

PGijsbers commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PGijsbers commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PGijsbers commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PGijsbers commented Jul 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PGijsbers Jul 4, 2024

Choose a reason for hiding this comment

Uh oh!

PGijsbers Jul 4, 2024

Choose a reason for hiding this comment

Uh oh!

PGijsbers Jul 4, 2024

Choose a reason for hiding this comment

Uh oh!

PGijsbers Jul 4, 2024

Choose a reason for hiding this comment

Uh oh!

PGijsbers Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PGijsbers commented Jul 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LennartPurucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PGijsbers commented Jul 3, 2024 •

edited

Loading

PGijsbers commented Jul 3, 2024 •

edited

Loading

codecov-commenter commented Jul 3, 2024 •

edited

Loading

PGijsbers commented Jul 3, 2024 •

edited

Loading

PGijsbers commented Jul 3, 2024 •

edited

Loading

PGijsbers Jul 4, 2024 •

edited

Loading

PGijsbers commented Jul 4, 2024 •

edited

Loading