Skip to content

Conversation

@PGijsbers
Copy link
Collaborator

@PGijsbers PGijsbers commented Jul 3, 2024

This PR updates tests to be compatible with newer versions of scikit-learn. In the process, fixes some bugs and adds numpy 2.0 support. I updated the CI matrix to include newer versions of scikit-learn. For the most part, openml-python works fine on newer versions, it was just that the unit tests themselves had hard dependencies on specifics. I extended/updated in the most cases, but in a few it was a little unreasonable (needs to be refactored to make it less sensitive to small scikit-learn api changes). I already spent quite a bit of time on this, so I hope that the proposed changes are good enough for now, so we can work on a 0.15.0 release.

The docs fail in the check-links step. I propose to fix this in a separate PR.

PGijsbers added 7 commits July 2, 2024 15:23
It is unclear how a condition where the test is supposed to pass
is created. Even after running the test suite 2-3 times, it does
not yet seem to pass.
There are some minor changes to the docstrings. I do not know
that it is useful to keep testing it this way, so for now I will
disable the test on newer versions.
The loss has been renamed.
The performance of the model also seems to have changed slightly
for the same seed. So I decided to compare with the lower fidelity
that was already used on Windows systems.
@PGijsbers
Copy link
Collaborator Author

PGijsbers commented Jul 3, 2024

edit: no longer relevant

Test failures seem to be caused by exceeding the 10 minute time limit we have set. I am not sure why these tests need as long, but it looks like they fail and are repeated. It's likely their failure is deterministic however. So I will go ahead and disable retries so the test does a hard fail instead with a more meaningful error message (hopefully).

@codecov-commenter
Copy link

codecov-commenter commented Jul 3, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 83.95%. Comparing base (923b49d) to head (2c161e4).

Files Patch % Lines
openml/extensions/sklearn/extension.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #1340       +/-   ##
============================================
+ Coverage    32.94%   83.95%   +51.00%     
============================================
  Files           38       38               
  Lines         5254     5259        +5     
============================================
+ Hits          1731     4415     +2684     
+ Misses        3523      844     -2679     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@PGijsbers
Copy link
Collaborator Author

PGijsbers commented Jul 3, 2024

Tests are incredibly slow, I am not sure why. This also happens locally, so it's not just the runners. I suspect it's the server.

edit: it looks like the culprit is mostly the production server :( (most (all?) of these long tests use prod)

@PGijsbers PGijsbers requested review from LennartPurucker and removed request for LennartPurucker July 3, 2024 13:44
@PGijsbers
Copy link
Collaborator Author

PGijsbers commented Jul 3, 2024

edit: message outdated, pr now fully supports numpy 2.0.


old message:

On Numpy 2.0 support/scikit-learn support (from #1341):

Numpy2.0 cleaned up their namespace, so we have to alias for the new version.
The problem is that this is not covered in CI (tested versions do not support numpy 2.0), and in general tests need to be updated to support more recent scikit-learn versions that support 2.0 (1.5, maybe 1.4?). I don't know if I really have the time for that right now. Options:

  • Put a pin in the dependencies
  • Accept this as a best-effort, and hope that the tests that don't fail due to scikit-learn incompatibility sufficiently cover the code to raise any numpy1/2 issues
  • Wait until I find time to make tests support scikit-learn 1.5

But I am currently actually leaning towards putting in a bit of extra time to make sure we can test on later scikit-learn versions.

PGijsbers added 6 commits July 4, 2024 11:19
It seems to me that run.evaluations is set only when the run is
fetched. Whether it has evaluations depends on server state.
So if the server has resolved the traces between the initial
fetch and the trace-check, you could be checking
len(run.evaluations) where evaluations is None.
Scikit-learn or numpy changed the typing of the parameters
(seen in a masked array, not sure if also outside of that).
Convert these values back to Python builtins.
)

if upload_flow and flow_id is None:
if upload_flow and flow_id is False:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flow_exists does not return None but False when a flow does not exist.
I find flow_id is False easier to parse than not flow_id since flow_id can be both integer or boolean. While both are correct, the check is clearer to me this way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bug would fail a unit test only the first time around, unit tests would later upload the flow anyway which means the second time this path is no longer taken and tests can pass. Because there are other tests which update server state that need to be ran a few times, this error was never caught as it was just in the initial sea of reds after a server reset.

runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: ["3.8"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the matrix to be effectively the same, but also testing each 1.x release.
The only difference is that the default is now 3.9, since that is supported for all scikit-learn versions (except 0.23) so it makes things easier.

- name: Install numpy for Python 3.8
# Python 3.8 & scikit-learn<0.24 requires numpy<=1.23.5
if: ${{ matrix.python-version == '3.8' && contains(fromJSON('["0.23.1", "0.22.2", "0.21.2"]'), matrix.scikit-learn) }}
if: ${{ matrix.python-version == '3.8' && matrix.scikit-learn == '0.23.1' }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

older versions were (no longer) included in the run matrix anyway.

We don't want to serialize as the value np.nan, we want to
include the nan directly. It is an indication that the
parameter was left unset.
time.sleep(10)
continue

run = openml.runs.get_run(run_id, ignore_cache=True)
Copy link
Collaborator Author

@PGijsbers PGijsbers Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes tests fail in with the old code because run.evaluations is None. But I am not able to reproduce this locally. However, the fails do seem fewer now that I move where the run object is loaded (which makes sense, initially there is a race condition where perhaps the trace isn't yet processed in get get_run call, but is by the time the get_run_trace is called). Additionally, I simply changed the check from a length check to a None check. As far as I can tell, evaluation should be a non-empty dictionary or None, so the length check doesn't make a lot of sense (probably historically the behavior was different). I kept in an assert with the old check just to make sure my assumption is correct (and we get an error if it isn't).

PGijsbers added 3 commits July 4, 2024 15:06
The default behavior if no evaluation is present is for it
to be None. So it makes sense to check for that instead.
As far as I can tell, run.evaluations should always contain
some items if it is not None. But I added an assert just in
case.
@PGijsbers PGijsbers requested a review from LennartPurucker July 4, 2024 16:45
@PGijsbers
Copy link
Collaborator Author

PGijsbers commented Jul 4, 2024

@LennartPurucker I pinged you for a review since at this point I am unsure what to do. The work is done, kind of. Sometimes the tests still fail sporadically because the pytest workers get killed on GitHub actions. I have not had this happen locally. It's also not consistent - they can also pass. At this point, there are already large improvements to the tests and CI. It's probably already worth merging. From there, I can work on both new features/updates as well as debug, but then at least that can be in parallel and be discussed in separate smaller PRs. What do you think?

edit: I suspect this may also be due to the timeout, though previously I have had clearer messages when the timeout functionality was triggered. I'll remove the timeouts so we can compare CI results.

edit2: The one failing test (sporadically) is due to an internal timeout now. I think it's fairly safe to say that this has to be a server side issue, with the same test passing on other instances.

I suspect they "crash" workers. This of course introduces the risk of hanging processes... But I cannot reproduce the issue locally.
Copy link
Contributor

@LennartPurucker LennartPurucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I agree and think we should merge this. The changes are all reasonable and important improvements.

@PGijsbers PGijsbers merged commit 532be7b into develop Jul 5, 2024
@PGijsbers PGijsbers deleted the fix/sklearn-test-compatibility branch July 5, 2024 12:56
@PGijsbers PGijsbers added this to the 0.15.0 milestone Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants