scale function(_get_mean_var) updated for dense array, speedup upto ~4.65x#3280
scale function(_get_mean_var) updated for dense array, speedup upto ~4.65x#3280kaushalprasadhial wants to merge 12 commits intoscverse:mainfrom
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3280 +/- ##
==========================================
- Coverage 76.67% 76.59% -0.08%
==========================================
Files 112 112
Lines 12946 12949 +3
==========================================
- Hits 9926 9918 -8
- Misses 3020 3031 +11
|
src/scanpy/preprocessing/_utils.py
Outdated
| if isinstance(X, np.ndarray): | ||
| n_threads = numba.get_num_threads() | ||
| mean, var = _compute_mean_var(X, axis=axis, n_threads=n_threads) |
There was a problem hiding this comment.
I still try to integrate all the stuff it's a mess though.
src/scanpy/preprocessing/_utils.py
Outdated
| # enforce R convention (unbiased estimator) for variance | ||
| if X.shape[axis] != 1: | ||
| var *= X.shape[axis] / (X.shape[axis] - 1) | ||
|
|
There was a problem hiding this comment.
Please remove erroneous diffs
Co-authored-by: Ilan Gold <[email protected]>
flying-sheep
left a comment
There was a problem hiding this comment.
Hi! This seems a bit unfinished. If you need our help with a local test environment or so, we’re happy to help.
| X: _SupportedArray, *, axis: Literal[0, 1] = 0 | ||
| ) -> tuple[NDArray[np.float64], NDArray[np.float64]]: | ||
| if isinstance(X, sparse.spmatrix): | ||
| if isinstance(X, np.ndarray): |
There was a problem hiding this comment.
instead of adding a second code path that handles np.ndarray, you should replace the existing one above:
@axis_mean.register(np.ndarray)
def _(X: np.ndarray, ...): ...| axis_i = 0 | ||
| mean = np.zeros(X.shape[axis_i], dtype=np.float64) | ||
| var = np.zeros(X.shape[axis_i], dtype=np.float64) |
There was a problem hiding this comment.
pull this out of the if branch
There was a problem hiding this comment.
I mean these assignments. When you have two branches, and both start with the same 3 lines, just do those before the if statement instead.
| var[r] += value * value | ||
| for c in numba.prange(X.shape[0]): | ||
| mean[c] = mean[c] / X.shape[1] | ||
| var[c] = (var[c] - mean[c] ** 2) / (X.shape[1] - 1) |
There was a problem hiding this comment.
there is no return statement
There was a problem hiding this comment.
i have updated the code please check
There was a problem hiding this comment.
Well, the tests aren’t passing, so it still doesn’t seem to be working
…cale-mean-variance
…a/scanpy into scale-mean-variance
|
This also doesn't work for F-Arrays |
…cale-mean-variance
We had created this PR before #3099. This one is the same PR with editing enabled for maintainers.
Hi,
We are submitting PR for speed up of the _get_mean_var function.
experiment setup : AWS r7i.24xlarge
add timer around _get_mean_var call
scanpy/scanpy/preprocessing/_scale.py
Line 167 in 706d4ef
we can also create _get_mean_var_std function that return std as well so we don't require to compute it in scale function(L168-L169).