-
Notifications
You must be signed in to change notification settings - Fork 81
Description
I noticed that in multiscale-spatial-image==2.0.3 the to_multiscale() implementation that is used for labels has a .compute() that leads to unnecessary computations and high-memory usage: https://github.com/spatial-image/multiscale-spatial-image/blob/ecb6aa410d1e0f3e82daa61ad2e252e921acbe49/multiscale_spatial_image/to_multiscale/_dask_image.py#L196.
Removing the .compute() seems to be enough to fix the issue, but this is not a straightforward option because multiscale-spatial-image moved the downscaling backend to ngff-zarr, adding it as a dependency, and ngff-zarr adds some extra dependencies that would make spatialdata heavier. At some point we may add ngff-zarr as a dependency anyway (because RFC-5 is being implemented both in ngff-zarr and ome-zarr-models-py, and we will pick one), but we need to think more about this. An initial discussion about dependencies can be found here fideus-labs/ngff-zarr#340.
Furthermoremultiscale-spatial-image==2.1.0 adds some upper bounds that would have to be relaxed, and does not support the latest version of ngff-zarr (on the other hand 2.0.3 doesn't have upper bounds so we can for the moment stay with it). As a last point, ngff-zarr seems to have a problem leading to very large dask graphs (affecting performance) for large data. This is being worked on, but it makes sense to wait. fideus-labs/ngff-zarr#48