Page MenuHomePhabricator

IO Support for Blosc2
Closed, ResolvedPublic

Description

The format is used in IMSY groups and also in several projects in MIC. Having support for the format to allow direct visualsation (and annotation) of images stored in this format might be nice.

  1. Check implications/dependencies
  2. If OK, implement a reader for image data type

    we'll have a look at the format and if there are implications involved

Source: https://github.com/Blosc/c-blosc2
Open questions:

  • Is a writer needed? (If yes, then it should be implemented with meta data support (blosc2 allows that, to also support lossless serialization of images and annotations in blosc2)

Event Timeline

floca triaged this task as Normal priority.Jun 3 2024, 10:00 AM
floca created this task.

c-blosc2 basically has 3 dependencies. We already fulfill and can/should use 2 of them: zlib (resp. zlib-ng) and lz4. The third one is zstd which is no issue, since c-blosc2 comes with an internal version of it if not provided externally.

A first dependency test on Windows was successful. I talked to @isensee and @j562r regarding advanced features used in our divisions and it seems like the latest available version of c-blosc2 is fine for our purposes. I continued to add a skeleton for an IO module.

The next step is to implement the actual reading of Blosc2 files. There are multiple options according to the file format specification but I focus first on .b2nd files, as I was given an example file by @isensee.

If you need additional files to test, I could give you one :)

First breakthrough: I was able to load and display the b2nd example image from @isensee. The orientation is unexpected, though. Probably rotated 180° (not mirrored?) around the axial axis. Trying to figure out the culprit. I use a the b2nd_to_cbuffer() function to copy the pixel data into the MITK image memory. If this orientation mismatch is systematic, we may be able to solve it by creating a geometry accordingly. At the moment the MITK image is just initialized via pixel type and image dimensions. As far as I am aware, the Blosc2 image does not contain any orientation meta data, right?

Screenshot 2024-07-05 224840.png (1×1 px, 421 KB)

Is there a defined anatomic coordinate system for blosc or the code that wrote the blosc images? Like LPS or RSA (https://www.slicer.org/wiki/Coordinate_systems) may be it differs form the assumption we have in MITK and therefore the memory layout of the pixels is different.

Is there a defined anatomic coordinate system for blosc or the code that wrote the blosc images? Like LPS or RSA (https://www.slicer.org/wiki/Coordinate_systems) may be it differs form the assumption we have in MITK and therefore the memory layout of the pixels is different.

Unfortunately Blosc2 does not standardize any of this in its specification. There's the general concept of metalayers. A metalayer is a space that allows users to store custom information. The format differentiates between fixed-length metalayers, which can be easily overridden and therefore are written before the image data, and variable-length metalayers, that are written after the image data where they can be easily replaced without writing all of the image data again.

In that sense, Blosc2 is only kind of meta data format and is up to the community or an application to come up with "standardized" metalayers (preferably in msgpack format according to the standard but not required). Metalayers have a name basically functioning as an ID but there is no such thing like an official registry for them to avoid name clashes.

I suggest we define a fixed-length metalayer that resembles a typical header of an image format like NRRD and publish it on GitHub, possibly asking the authors of Blosc2 to publicly recognize/reserve its name. We can utilize a separate variable-length metalayer for any additional meta data.

I think you proposal is worth trying! Great idea.

Further (or as interim solution to that) I think we should

  1. define our default ACS.
  2. Ensure that our blosc2 writer uses it
  3. educate MIC how to do it (e.g. by also simply offering the right tools (e.g. a new version of MITKFileConverter that writes "correct" domaine bloscv2 or by providing early access to our pyMITK package so that people can as soon as possible use mitk::IOutils directly in their python script to write bloscv2 in a standardized way.

@isensee what do you think?

I am not knowledgeable enough to have a well informed opinion. Some thoughts to consider:

  • blosc2 is universally applicable for all nd arrays and the devs might not care about supporting niche use cases such as image geometries that are highly specific for medical images
  • my recommendation would be to draft a MIC internal metadata layer for geometries and provide python code for loading and storing images with that layer. Be aware that
    • images can be 2D (xy), 3D (cxy), 3D (xyz), 4D (txyz), 4D (cxyz), 5D (tcxyz) (with c being color channel = multiple modalities and t being time) etc which may need special encoding of the geometries. We can also just define a more narrow scope to tackle only the use cases we really need (drop everything with a t in it)
    • block and chunk size are tremendously important for storing training data and must be configurable in that interface
  • we can offer that metadata layer as contribution to python-blosc2 but need to be prepared for the devs not wanting it in which case we can also publish that ourselves (with less visibility)

If I would be them, I also wouldn't like to have another metalayer under my belt. I think we can directly go with our own publishing. We may consider to suggest something like a registry, though, for third-party metalayers. :)

kislinsk added a project: Moved to git.dkfz.de.

This task was closed here on Phabricator since it was migrated to GitLab. Please continue on GitLab.