Skip to content

to_multiscale() is not lazy for labels #1069

@LucaMarconato

Description

@LucaMarconato

I noticed that in multiscale-spatial-image==2.0.3 the to_multiscale() implementation that is used for labels has a .compute() that leads to unnecessary computations and high-memory usage: https://github.com/spatial-image/multiscale-spatial-image/blob/ecb6aa410d1e0f3e82daa61ad2e252e921acbe49/multiscale_spatial_image/to_multiscale/_dask_image.py#L196.

Removing the .compute() seems to be enough to fix the issue, but this is not a straightforward option because multiscale-spatial-image moved the downscaling backend to ngff-zarr, adding it as a dependency, and ngff-zarr adds some extra dependencies that would make spatialdata heavier. At some point we may add ngff-zarr as a dependency anyway (because RFC-5 is being implemented both in ngff-zarr and ome-zarr-models-py, and we will pick one), but we need to think more about this. An initial discussion about dependencies can be found here fideus-labs/ngff-zarr#340.

Furthermoremultiscale-spatial-image==2.1.0 adds some upper bounds that would have to be relaxed, and does not support the latest version of ngff-zarr (on the other hand 2.0.3 doesn't have upper bounds so we can for the moment stay with it). As a last point, ngff-zarr seems to have a problem leading to very large dask graphs (affecting performance) for large data. This is being worked on, but it makes sense to wait. fideus-labs/ngff-zarr#48

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions