Skip to content

Add scientific data and geospatial publishing guides & case study#2243

Open
2color wants to merge 31 commits intoipfs:mainfrom
2color:geospatial-guide
Open

Add scientific data and geospatial publishing guides & case study#2243
2color wants to merge 31 commits intoipfs:mainfrom
2color:geospatial-guide

Conversation

@2color
Copy link
Member

@2color 2color commented Jan 23, 2026

What

Adds new documentation focused on scientific/geospatial data publishing with IPFS (Zarr + tooling), plus an ORCESTRA case study and related VuePress navigation updates, with a small quickstart retrieval enhancement.

Changes:

Added “Scientific data and IPFS landscape guide” and “Publish geospatial Zarr data with IPFS” how-to pages.
Added ORCESTRA case study and updated VuePress sidebar/navigation to include the new section + case study.
Extended retrieval quickstart with a Python/ipfsspec verified retrieval example and updated spellcheck ignore list.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 23, 2026

🚀 Build Preview on IPFS ready

@mishmosh
Copy link
Collaborator

This is great as a specific how-to. Is there another, complementary place we can write about all the ways geospatial users can benefit from IPFS?

From live meeting:

  • Consider title “Scientific Data” as category
    • Ecosystem Tooling
    • Guide to Publishing Scientific Data
  • IPFS is used by the geospatial community for better collaboration, data integrity, and open access.
    (make sure we can describe some of the architectures used)
    • Connecting kubo to your existing data repositories (stac catalog)
    • Private clusters (but open retrieval) or “Collaborative publishing”
    • Provenance

@2color 2color marked this pull request as ready for review February 4, 2026 16:14
Copy link
Collaborator

@mishmosh mishmosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions and comments inline, but I'm confident you can take it from here. Would also like to see @vmx review.

2color and others added 6 commits February 6, 2026 14:34
Co-authored-by: Volker Mische <volker.mische@gmail.com>
Co-authored-by: Mosh <1306020+mishmosh@users.noreply.github.com>
Co-authored-by: Mosh <1306020+mishmosh@users.noreply.github.com>
@2color 2color requested a review from vmx February 6, 2026 14:46
Comment on lines +103 to +105
--raw-leaves \
--chunker=size-1048576 \
--cid-version=1 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once Kubo 0.40 ships these could be removed and replaced by one-time ipfs config profile apply unixfs-v1-2025 or setting Import.* values one-by-one

@2color

This comment was marked as outdated.

@2color 2color changed the title Add geospatial publishing guide Add scientific data and geospatial publishing guides & case study Feb 13, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new Scientific Data documentation content to the IPFS docs site, including a hands-on guide for publishing geospatial Zarr datasets and supporting context via a landscape overview and an ORCESTRA case study. Updates the VuePress sidebar to surface the new pages and case study.

Changes:

  • Add a new “Publish Geospatial Zarr Data with IPFS” how-to guide.
  • Add a new “Scientific Data and IPFS Landscape Guide” overview page.
  • Add a new ORCESTRA case study and update VuePress navigation (including sidebar re-organization and case study list).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 20 comments.

File Description
docs/how-to/scientific-data/publish-geospatial-zarr-data.md New step-by-step publishing guide (Zarr + IPFS), including discovery/access patterns.
docs/how-to/scientific-data/landscape-guide.md New overview of scientific data formats, architectural patterns, and ecosystem tooling.
docs/case-studies/orcestra.md New case study describing ORCESTRA’s use of IPFS for scientific data distribution.
docs/.vuepress/config.js Adds the new Scientific Data pages to the How-to sidebar and adds ORCESTRA to case studies; also reorganizes peer-related sidebar entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mishmosh and others added 5 commits February 19, 2026 03:21
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +77 to +80
ds = xr.open_dataset(filename)
# Example: targeting ~1 MB chunks with float32 data
ds.to_zarr('output.zarr', encoding={
'var_name': {'chunks': (1, 512, 512)}
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chunking example uses undefined placeholders (filename, var_name), which will error if readers copy/paste. Consider making these explicit string placeholders (e.g., "path/to/file" / "variable_name") or adding a short comment that they must be replaced.

Suggested change
ds = xr.open_dataset(filename)
# Example: targeting ~1 MB chunks with float32 data
ds.to_zarr('output.zarr', encoding={
'var_name': {'chunks': (1, 512, 512)}
filename = "path/to/your/file.nc" # Replace with the path to your dataset
ds = xr.open_dataset(filename)
# Example: targeting ~1 MB chunks with float32 data
ds.to_zarr('output.zarr', encoding={
'variable_name': {'chunks': (1, 512, 512)} # Replace with the name of your variable

Copilot uses AI. Check for mistakes.
2color and others added 3 commits February 24, 2026 10:48
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

@lkluft lkluft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this really nice article about ORCESTRA!

I made a couple of minor suggestions, but I do like the overall story very much! 👍

Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only major thing is the comment about using some sample data to follow along the steps of the publishing guide.

In all Zarr docs it would make sense that you refer to Zarr v3 as e.g. the metadata is different in v2.

```python
import xarray as xr

ds = xr.open_dataset(filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the MFS section you introduce "HALO flight datasets". I think it would be good if you could introduce them right here at the beginning and then make all steps working with that dataset, so one can follow along locally.

When I started to look into Zarr, I had a hard time figuring out what it exactly looked like on disk. Most resources I found where online notebooks. So such a start to finish walk through with actual data would've been useful. E.g. it would've been clearer to me that a "somedata.zarr" is actually a directory and not a single file.

Also mentioning that this is about Zarr v3 would make sense as Zarr v2 still seems to be quite common.


# Each mutation produces a new root CID — a lightweight versioned snapshot
ipfs files stat --hash /datasets/halo
# bafybeihqixf5ew7mfr74bzb74qiw2mgtnytabnpzjnf5xeejzq4p2ocygu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a final ls to show that all directories are there.

2color and others added 5 commits February 26, 2026 10:15
Co-authored-by: Lukas Kluft <lukas.kluft@gmail.com>
Co-authored-by: Lukas Kluft <lukas.kluft@gmail.com>
Co-authored-by: Lukas Kluft <lukas.kluft@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants