unpinning dask #1006

melonora · 2025-10-27T16:14:17Z

This PR unpins dask by adding a .attrs accessor for both dask Series and DataFrame. It pretty much functions as a drop-in replacement for the .attrs attribute in previous Dask versions. This PR is a prerequisite for unlocking zarr v3 sharding api.
This will also unlock spatialdata+rapids-singlecell in the same environment.

Note regarding Dask versions supported: 2025.2.0 onwards dropped all Dask legacy support. Furthermore, after extensive testing, I saw no other way than not to support version 2025.1.0 as we have a couple of operations that cause dependency issues in the Dask graphs. Using 2025.2.0 onwards, this will provide users with an error message that we can resolve, whereas previously, the computation would hang.

For now, some minor computations would be done in memory due to this mixed graph dependency problem. I can reconstruct the proper graph so computation can be completely lazy again, but this would work for a follow-up PR.
Lastly, for writing, optimize_graph has been set to False. Not doing this causes 'permission denied' errors on Windows, particularly when performing atomic writes, where partial files must be renamed once writing is completed. In general, there were some differences between the various OSs, so I enabled testing for the lowest version of Dask we support and the latest version. I also now include Windows in the CI.

The next follow-up work will be to open a PR to Dask that enables sharding support there. Once that is completed, we can create the sharding support in SpatialData.

for more information, see https://pre-commit.ci

codecov · 2025-10-28T19:22:52Z

Codecov Report

❌ Patch coverage is 92.37288% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.11%. Comparing base (79cf8c9) to head (72121d3).
⚠️ Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
src/spatialdata/models/_accessor.py	90.00%	7 Missing ⚠️
src/spatialdata/_core/query/spatial_query.py	88.88%	1 Missing ⚠️
src/spatialdata/_io/_utils.py	95.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1006      +/-   ##
==========================================
+ Coverage   92.08%   92.11%   +0.02%     
==========================================
  Files          48       49       +1     
  Lines        7446     7537      +91     
==========================================
+ Hits         6857     6943      +86     
- Misses        589      594       +5

Files with missing lines	Coverage Δ
src/spatialdata/__init__.py	`100.00% <100.00%> (+3.57%)`	⬆️
src/spatialdata/_core/_deepcopy.py	`98.38% <100.00%> (+0.02%)`	⬆️
src/spatialdata/_core/operations/rasterize.py	`90.16% <100.00%> (+0.13%)`	⬆️
src/spatialdata/_core/operations/transform.py	`91.22% <100.00%> (+0.03%)`	⬆️
src/spatialdata/_core/spatialdata.py	`91.93% <100.00%> (-0.01%)`	⬇️
src/spatialdata/_io/io_raster.py	`93.93% <100.00%> (ø)`
src/spatialdata/datasets.py	`100.00% <100.00%> (ø)`
src/spatialdata/models/models.py	`88.57% <ø> (-0.05%)`	⬇️
src/spatialdata/transformations/_utils.py	`94.77% <ø> (-0.04%)`	⬇️
src/spatialdata/_core/query/spatial_query.py	`95.46% <88.88%> (-0.19%)`	⬇️
... and 2 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/spatialdata/transformations/operations.py

src/spatialdata/transformations/_utils.py

tests/core/query/test_spatial_query.py

ilan-gold

Just small comments, tough to comment on the usage so I tried to focus on the new code. Happy to rereview!

src/spatialdata/_io/accessor.py

LucaMarconato · 2025-10-31T13:34:01Z

src/spatialdata/_core/operations/rasterize.py

+            # We have to do this because as_known() does not preserve the order anymore in latest dask versions
+            # TODO discuss whether we can always expect the index from before to be monotonically increasing, because
+            # then we don't have to check order.
+            if index:
+                data[VALUES_COLUMN] = data[VALUES_COLUMN].cat.set_categories(data.index, ordered=True)


Do you think that we could report this to dask? Maybe it is an unintended change. Or was it more that the order was never guaranteed in the first place?

I will discuss it during their community meeting. I would have to dive a bit deeper into the exact cause, but they themselves don't seem to define set_categories so to me it seems like it comes from pandas dataframe but then the pandas dataframe only works per partition, but I am not certain about that. I did not want to spend too much time on it for now though as they can point me in the right direction much quicker.

src/spatialdata/_core/operations/transform.py

src/spatialdata/_io/_utils.py

src/spatialdata/_core/operations/rasterize.py

melonora · 2025-11-10T14:08:04Z

@LucaMarconato think everything is in order here now. Do you see remaining blockers?

LucaMarconato · 2025-11-10T14:54:08Z

Thanks @melonora, I'll review the changes.

melonora · 2025-11-22T18:54:21Z

Action point for new PR is to ensure that the indices for dask dataframes are monotonically increasing across multiple partitions. Right now, it may not be the case, and as a result partitions may overlap, which creates problems when evaluating the Dask graph. However, this will be a separate PR.

melonora and others added 11 commits October 23, 2025 14:26

add attrs accesor

dd38ad4

change deprecated Index access

4feb491

add accessor to init

1c042ea

remove query planning

b733de2

additional changes to accessor

e53e215

divisions is not settable anymore

3ae1d29

add fixes

19684fb

[pre-commit.ci] auto fixes from pre-commit.com hooks

51b733e

for more information, see https://pre-commit.ci

fix rasterize points

88fe003

fix rasterize points

d8b2cc4

copy partitioned attrs

239f693

melonora added 2 commits October 30, 2025 00:24

fix mypy

afad6bd

fix last mypy error

e017ca7

melonora mentioned this pull request Oct 30, 2025

fix categories import #1010

Merged

melonora requested a review from ilan-gold October 30, 2025 09:11

melonora commented Oct 30, 2025

View reviewed changes

src/spatialdata/transformations/operations.py Outdated Show resolved Hide resolved

melonora commented Oct 30, 2025

View reviewed changes

src/spatialdata/transformations/_utils.py Outdated Show resolved Hide resolved

melonora commented Oct 30, 2025

View reviewed changes

tests/core/query/test_spatial_query.py Outdated Show resolved Hide resolved

melonora added 3 commits October 30, 2025 10:16

Apply suggestion from @melonora

d11655a

Apply suggestion from @melonora

8253eb8

Apply suggestion from @melonora

078469a

ilan-gold reviewed Oct 30, 2025

View reviewed changes

melonora and others added 3 commits October 30, 2025 16:52

deduplicate

65839b4

deduplicate

1a7bfbf

Merge branch 'main' into dataframe_accessor

a7a6018

LucaMarconato reviewed Oct 31, 2025

View reviewed changes

src/spatialdata/_core/operations/transform.py Show resolved Hide resolved

LucaMarconato reviewed Oct 31, 2025

View reviewed changes

src/spatialdata/_io/_utils.py Show resolved Hide resolved

LucaMarconato reviewed Oct 31, 2025

View reviewed changes

src/spatialdata/_io/_utils.py Show resolved Hide resolved

melonora added 19 commits November 2, 2025 21:38

change git workflow

f76672e

some fixes

f68d55d

remove old test code

9f26549

test dask among os

39dc8fb

fix

1a83439

fix

18fdb70

fix

d43cac1

revert changes

a78c680

fix

8d5251b

adjust

b9a228a

adjust dask pin

7efabfe

adjust dask pin

3ed65bd

fix dask backing files and windows permissions

1824296

fix dask mixed graph problem

42c2452

temporary fix indexing

93b48be

fix rasterize

1813c84

adjust github workflow

50374bb

move 3.13 to include

fafede5

make more concise

a06302d

melonora commented Nov 4, 2025

View reviewed changes

src/spatialdata/_core/operations/rasterize.py Outdated Show resolved Hide resolved

Apply suggestion from @melonora

990891a

LucaMarconato mentioned this pull request Nov 9, 2025

bump mins: python, spatialdata, dask scverse/napari-spatialdata#377

Merged

1 task

LucaMarconato mentioned this pull request Nov 13, 2025

Unpin dask and relax xarray upper bound vitessce/easy_vitessce#49

Open

fix str representation

72121d3

melonora merged commit 53b9438 into scverse:main Nov 22, 2025
11 checks passed

LucaMarconato added the release-changed label Jan 1, 2026

LucaMarconato mentioned this pull request Jan 16, 2026

Dask diagnostics import error #1052

Closed

unpinning dask #1006

unpinning dask #1006

Uh oh!

Conversation

melonora commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LucaMarconato Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

melonora Nov 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

melonora commented Nov 10, 2025

Uh oh!

LucaMarconato commented Nov 10, 2025

Uh oh!

melonora commented Nov 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

melonora commented Oct 27, 2025 •

edited

Loading

codecov bot commented Oct 28, 2025 •

edited

Loading

LucaMarconato Oct 31, 2025 •

edited

Loading