Skip to content

Merge v1 Feature Branch to Main #535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from
10 changes: 10 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
"sphinx.ext.autosummary",
"sphinxcontrib.autodoc_pydantic",
"sphinx.ext.autosectionlabel",
"sphinx_click",
"sphinx_copybutton",
Expand All @@ -38,6 +39,7 @@
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"pydantic": ("https://docs.pydantic.dev/latest/", None),
"zarr": ("https://zarr.readthedocs.io/en/stable/", None),
}

Expand All @@ -50,6 +52,14 @@
autoclass_content = "class"
autosectionlabel_prefix_document = True

autodoc_pydantic_field_list_validators = False
autodoc_pydantic_field_swap_name_and_alias = True
autodoc_pydantic_field_show_alias = False
autodoc_pydantic_model_show_config_summary = False
autodoc_pydantic_model_show_validator_summary = False
autodoc_pydantic_model_show_validator_members = False
autodoc_pydantic_model_show_field_summary = False

html_theme = "furo"

myst_number_code_blocks = ["python"]
Expand Down
154 changes: 154 additions & 0 deletions docs/data_models/chunk_grids.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
```{eval-rst}
:tocdepth: 3
```

```{currentModule} mdio.schemas.chunk_grid

```

# Chunk Grid Models

```{article-info}
:author: Altay Sansal
:date: "{sub-ref}`today`"
:read-time: "{sub-ref}`wordcount-minutes` min read"
:class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light
```

The variables in MDIO data model can represent different types of chunk grids.
These grids are essential for managing multi-dimensional data arrays efficiently.
In this breakdown, we will explore four distinct data models within the MDIO schema,
each serving a specific purpose in data handling and organization.

MDIO implements data models following the guidelines of the Zarr v3 spec and ZEPs:

- [Zarr core specification (version 3)](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html)
- [ZEP 1 — Zarr specification version 3](https://zarr.dev/zeps/accepted/ZEP0001.html)
- [ZEP 3 — Variable chunking](https://zarr.dev/zeps/draft/ZEP0003.html)

## Regular Grid

The regular grid models are designed to represent a rectangular and regularly
paced chunk grid.

```{eval-rst}
.. autosummary::
RegularChunkGrid
RegularChunkShape
```

For 1D array with `size = 31`{l=python}, we can divide it into 5 equally sized
chunks. Note that the last chunk will be truncated to match the size of the array.

`{ "name": "regular", "configuration": { "chunkShape": [7] } }`{l=json}

Using the above schema resulting array chunks will look like this:

```bash
←─ 7 ─→ ←─ 7 ─→ ←─ 7 ─→ ←─ 7 ─→ ↔ 3
┌───────┬───────┬───────┬───────┬───┐
└───────┴───────┴───────┴───────┴───┘
```

For 2D array with shape `rows, cols = (7, 17)`{l=python}, we can divide it into 9
equally sized chunks.

`{ "name": "regular", "configuration": { "chunkShape": [3, 7] } }`{l=json}

Using the above schema, the resulting 2D array chunks will look like below.
Note that the rows and columns are conceptual and visually not to scale.

```bash
←─ 7 ─→ ←─ 7 ─→ ↔ 3
┌───────┬───────┬───┐
│ ╎ ╎ │ ↑
│ ╎ ╎ │ 3
│ ╎ ╎ │ ↓
├╶╶╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ │ ↑
│ ╎ ╎ │ 3
│ ╎ ╎ │ ↓
├╶╶╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ │ ↕ 1
└───────┴───────┴───┘
```

## Rectilinear Grid

The [RectilinearChunkGrid](RectilinearChunkGrid) model extends
the concept of chunk grids to accommodate rectangular and irregularly spaced chunks.
This model is useful in data structures where non-uniform chunk sizes are necessary.
[RectilinearChunkShape](RectilinearChunkShape) specifies the chunk sizes for each
dimension as a list allowing for irregular intervals.

```{eval-rst}
.. autosummary::
RectilinearChunkGrid
RectilinearChunkShape
```

:::{note}
It's important to ensure that the sum of the irregular spacings specified
in the `chunkShape` matches the size of the respective array dimension.
:::

For 1D array with `size = 39`{l=python}, we can divide it into 5 irregular sized
chunks.

`{ "name": "rectilinear", "configuration": { "chunkShape": [[10, 7, 5, 7, 10]] } }`{l=json}

Using the above schema resulting array chunks will look like this:

```bash
←── 10 ──→ ←─ 7 ─→ ← 5 → ←─ 7 ─→ ←── 10 ──→
┌──────────┬───────┬─────┬───────┬──────────┐
└──────────┴───────┴─────┴───────┴──────────┘
```

For 2D array with shape `rows, cols = (7, 25)`{l=python}, we can divide it into 12
rectilinear (rectangular bur irregular) chunks. Note that the rows and columns are
conceptual and visually not to scale.

`{ "name": "rectilinear", "configuration": { "chunkShape": [[3, 1, 3], [10, 5, 7, 3]] } }`{l=json}

```bash
←── 10 ──→ ← 5 → ←─ 7 ─→ ↔ 3
┌──────────┬─────┬───────┬───┐
│ ╎ ╎ ╎ │ ↑
│ ╎ ╎ ╎ │ 3
│ ╎ ╎ ╎ │ ↓
├╶╶╶╶╶╶╶╶╶╶┼╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ ╎ │ ↕ 1
├╶╶╶╶╶╶╶╶╶╶┼╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ ╎ │ ↑
│ ╎ ╎ ╎ │ 3
│ ╎ ╎ ╎ │ ↓
└──────────┴─────┴───────┴───┘
```

## Model Reference

:::{dropdown} RegularChunkGrid
:animate: fade-in-slide-down

```{eval-rst}
.. autopydantic_model:: RegularChunkGrid

----------

.. autopydantic_model:: RegularChunkShape
```

:::
:::{dropdown} RectilinearChunkGrid
:animate: fade-in-slide-down

```{eval-rst}
.. autopydantic_model:: RectilinearChunkGrid

----------

.. autopydantic_model:: RectilinearChunkShape
```

:::
100 changes: 100 additions & 0 deletions docs/data_models/compressors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
```{eval-rst}
:tocdepth: 3
```

```{currentModule} mdio.schemas.compressors

```

# Compressors

```{article-info}
:author: Altay Sansal
:date: "{sub-ref}`today`"
:read-time: "{sub-ref}`wordcount-minutes` min read"
:class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light
```

## Dataset Compression

MDIO relies on [numcodecs] for data compression. We provide good defaults based
on opinionated and limited heuristics for each compressor for various energy datasets.
However, using these data models, the compression can be customized.

[Numcodecs] is a project that a convenient interface to different compression
libraries. We selected the [Blosc] and [ZFP] compressors for lossless and lossy
compression of energy data.

## Blosc

A high-performance compressor optimized for binary data, combining fast compression
with a byte-shuffle filter for enhanced efficiency, particularly effective with
numerical arrays in multi-threaded environments.

For more details about compression modes, see [Blosc Documentation].

```{eval-rst}
.. autosummary::
Blosc
```

## ZFP

ZFP is a compression algorithm tailored for floating-point and integer arrays, offering
lossy and lossless compression with customizable precision, well-suited for large
scientific datasets with a focus on balancing data fidelity and compression ratio.

For more details about compression modes, see [ZFP Documentation].

```{eval-rst}
.. autosummary::
ZFP
```

[numcodecs]: https://github.com/zarr-developers/numcodecs
[blosc]: https://github.com/Blosc/c-blosc
[blosc documentation]: https://www.blosc.org/python-blosc/python-blosc.html
[zfp]: https://github.com/LLNL/zfp
[zfp documentation]: https://computing.llnl.gov/projects/zfp

## Model Reference

:::
:::{dropdown} Blosc
:animate: fade-in-slide-down

```{eval-rst}
.. autopydantic_model:: Blosc

----------

.. autoclass:: BloscAlgorithm()
:members:
:undoc-members:
:member-order: bysource

----------

.. autoclass:: BloscShuffle()
:members:
:undoc-members:
:member-order: bysource
```

:::

:::{dropdown} ZFP
:animate: fade-in-slide-down

```{eval-rst}
.. autopydantic_model:: ZFP

----------

.. autoclass:: ZFPMode()
:members:
:undoc-members:
:member-order: bysource
```

:::
Loading
Loading