Skip to content

New reader for G4X datasets (Singular Genomics) #281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

ckmah
Copy link
Contributor

@ckmah ckmah commented Feb 19, 2025

👋 Hello @scverse/spatialdata and community,

I would like to contribute the initial version of a spatialdata-io reader for Singular Genomics G4X datasets that I recently developed for internal use (I work at Singular), and now for the spatial community. It is still experimental and not fully battle-tested, but I tried to keep the API consistent with the other readers as much as possible. However, there are few key additions I made to streamline use with our datasets:

Notable features

  • Incremental I/O of elements (images, tables etc.) G4X datasets can get pretty large since they are multimodal. Therefore, we made sure the reader saves elements as soon as they are converted to reduce memory Reduce readers' memory consumption #229 and mitigate data loss. This is handled via the g4x(..., mode="append") parameter. The user can also choose mode="overwrite" to turn this off. The constructed SpatialData is also re-read from disk automatically to fully take advantage of lazy loading.
  • Read one or more samples at once. This corresponds to our assay design and enables converting an entire experiment with a single function call. The reader will then return a single SpatialData object or a list of them accordingly.

Additional Dependencies

  • Some of our images are encoded in the Jpeg2000 (.jp2, j2k) format and require the glympur package to read

Misc.

Are there any other pieces I should have in this PR? Devs please let me know, I'm happy to add them. Here are relevant ones I can think of:

  • Documentation/tutorial notebook? (also not sure if I used @injectdocs decorator properly)
  • Parse experimental metadata: sample names, positions, acquisition info etc.
  • spatialdata-io CLI compatibility

@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2025

Codecov Report

Attention: Patch coverage is 17.19298% with 236 lines in your changes missing coverage. Please review.

Project coverage is 46.45%. Comparing base (296d9a5) to head (f670424).
Report is 174 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata_io/readers/g4x.py 10.26% 236 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #281      +/-   ##
==========================================
+ Coverage   39.16%   46.45%   +7.29%     
==========================================
  Files          26       27       +1     
  Lines        2663     2994     +331     
==========================================
+ Hits         1043     1391     +348     
+ Misses       1620     1603      -17     
Files with missing lines Coverage Δ
src/spatialdata_io/__init__.py 100.00% <100.00%> (ø)
src/spatialdata_io/_constants/_constants.py 100.00% <100.00%> (ø)
src/spatialdata_io/readers/g4x.py 10.26% <10.26%> (ø)

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants