Reuse dpnp.nan_to_num in dpnp.nansum and dpnp.nanprod#2339
Merged
ndgrigorian merged 6 commits intomasterfrom Mar 3, 2025
Merged
Reuse dpnp.nan_to_num in dpnp.nansum and dpnp.nanprod#2339ndgrigorian merged 6 commits intomasterfrom
dpnp.nan_to_num in dpnp.nansum and dpnp.nanprod#2339ndgrigorian merged 6 commits intomasterfrom
Conversation
Contributor
|
View rendered docs @ https://intelpython.github.io/dpnp/index.html |
Collaborator
Contributor
|
Array API standard conformance tests for dpnp=0.18.0dev0=py312he4f9c94_16 ran successfully. |
5997cf3 to
4c0908b
Compare
Collaborator
Author
|
This relatively simple and non-invasive change improves performance significantly. On Max GPU before: In [1]: import dpnp
In [2]: x = dpnp.ones(3*10**8, dtype="f4")
In [3]: q = x.sycl_queue
In [4]: %timeit r = dpnp.nansum(x); q.wait()
9.37 ms ± 33.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [5]: %timeit r = dpnp.nansum(x); q.wait()
9.42 ms ± 18.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [6]: x = dpnp.ones(10**8, dtype="f4")
In [7]: %timeit r = dpnp.nansum(x); q.wait()
4.5 ms ± 8.8 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: %timeit r = dpnp.nansum(x); q.wait()
4.51 ms ± 11 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)after: In [1]: import dpnp
In [2]: x = dpnp.ones(3*10**8, dtype="f4")
In [3]: q = x.sycl_queue
In [4]: %timeit r = dpnp.nansum(x); q.wait()
6.5 ms ± 24.5 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [5]: %timeit r = dpnp.nansum(x); q.wait()
6.47 ms ± 35.7 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [6]: x = dpnp.ones(10**8, dtype="f4")
In [7]: %timeit r = dpnp.nansum(x); q.wait()
2.78 ms ± 14.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [8]: %timeit r = dpnp.nansum(x); q.wait()
2.78 ms ± 14 μs per loop (mean ± std. dev. of 7 runs, 100 loops each) |
vtavana
reviewed
Feb 28, 2025
antonwolfy
reviewed
Feb 28, 2025
aa48c71 to
4552fe8
Compare
Collaborator
Author
|
Changes to I will revert the commits changing the nanarg functions and add a warning about synchronization. |
d0dad9b to
f69ef28
Compare
Collaborator
Author
antonwolfy
reviewed
Mar 2, 2025
Moved warnings relating to all-NaN and all-negative-inf slices to near the synchronization warning
8d78920 to
1995cd5
Compare
antonwolfy
approved these changes
Mar 3, 2025
Contributor
antonwolfy
left a comment
There was a problem hiding this comment.
Thank you @ndgrigorian, LGTM!
github-actions bot
added a commit
that referenced
this pull request
Mar 4, 2025
Reuse `dpnp.nan_to_num` in `dpnp.nansum` and `dpnp.nanprod` 14274d8
antonwolfy
added a commit
that referenced
this pull request
Mar 25, 2025
antonwolfy
added a commit
that referenced
this pull request
Mar 25, 2025
antonwolfy
added a commit
that referenced
this pull request
Mar 25, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

This PR proposes the use of
nan_to_numover_replace_naninnansum,nanprod,nancumsum, andnancumprodusing new internal function_replace_nan_no_mask.