Page MenuHomePhabricator

[Segmentation] Worklist specification, data structure and io functions
Closed, ResolvedPublic

Description

Feature: Ability to specify a list with work items. Each work item specifies (a) an image that should be segmented, (b) (if existant) a pre segmentation (that should be reviewed/checked/refined), (c) confirmation criteria and (d) other relevant config settings (e.g. like the preset label list or custom label name list).

Key feature ideas:

  • Format: json or xml
  • css like behavior. Properties/Settings (like the preset label list, confirmation criteria) can be specified for the whole list or per item. Definition in Items overrules global definition.

TODO further specification

IMPORTANT: For the actual specification, please see comments further down: https://phabricator.mitk.org/T29160#239621. The example immediately below named "Proposal" has not been implemented as-is!
Proposal
{
  "Version": 1,
  "UID": "abcde", //unique id for the worklist
  "Name": "Study XY Worklist #1", //Display name / human readable name of work list
  "Segmentation.labelSetPreset": "./labels.preset",
  "Segmentation.customLabelNames": "./optionalLabelNames.json",
  "ConfirmationRules": [
    {
      "RuleID": "NoEmptyLabel",
      "IncludeOptionalLabels": 'True'
      "LabelScopeProperty": { //rule applies to labels that have properties that fit the following specification
        "Prop1": "Propvalue"
      }
    },
    {
      "RuleID": "NoUnlabeledPixels"
    }
  ]
  "Items": [
    {
      "UID": "item1", //unique id for the worklist item
      "Name": "Optional item name", //Display name / human readable name of work list item. optional
      "InputPath": "./item1/image.nrrd",
      "SegmentationPath": "./item1/preseg.nrrd",
      "SegmentationNodeName": "segmentation", //option to specify the name of the node that will contain the segmentation (either preloaded or generated)
      "ResultPath": "./output/item1/result.nrrd"
    },
    {
      "UID": "item2", //unique id for the worklist item
      "Name": "Optional item name", //Display name / human readable name of work list item. optional
      "InputPath": "./item2/image.nrrd",
      "SegmentationPath": "./item2/preseg.nrrd",
      "Segmentation.labelSetPreset": "./item2/labels.preset"
    },
    {
      "UID": "item3", //unique id for the worklist item
      "Name": "Optional item name", //Display name / human readable name of work list item. optional
      "ScenePath": "./item3/item3.mitkscene",
      "SegmentationNodeName": "segmentation", //option to specify the name of the node that will contain the segmentation (either preloaded or generated) also helps in context of scene loading to know which node should be used for the annotation.
      "ResultPath": "./output/item3/result.nrrd",
      "Segmentation.labelSetPreset": "./item3/labels.preset"
    }

  ]
}

Event Timeline

floca triaged this task as Normal priority.May 3 2022, 6:16 PM
floca created this task.
floca updated the task description. (Show Details)

MITK Segmentation Task Lists

The first version of the file format used in T29159 is as follows:

MITK Segmentation Task List files are JSON files containing a JSON object as root. It must contain the two mandatory properties FileFormat and Version:

{
  "FileFormat": "MITK Segmentation Task List",
  "Version": 1
}

We also recommend to specify an optional Name that is used in the application if present instead of the plain filename of the JSON file:

{
  "FileFormat": "MITK Segmentation Task List",
  "Version": 1,
  "Name": "My First Task List"
}
Tasks

The root object must also contain a mandatory Tasks array, containing JSON objects that specify the individual tasks of the task list. A minimum task object must contain Image and Result file paths. Image refers to the patient image and Result refers to the path were the resulting segmentation is expected to be stored. Paths can be absolute or relative to the JSON file.

{
  "FileFormat": "MITK Segmentation Task List",
  "Version": 1,
  "Tasks": [
    {
      "Image": "images/Pic3D.nrrd",
      "Result": "results/liver.nrrd"
    }
  ]
}

In addition, tasks can contain a bunch of optional properties that mainly specify a segmentation a user starts with:

  • Name (string): A name for the task.
  • Description (string): A short description/definition of the task.
  • LabelName (string): The name of the first label in a new segmentation that is created for the task on the fly.
  • LabelNameSuggestions (file path): A Label Suggestions JSON file (example in next comment) specifying names and optional colors, that are suggested to the user for new labels in the segmentation.
  • Preset (file path): A Label Set Preset XML file in MITK's .lsetp file format. The preset is applied to a new segmentation that is created for the task on the fly. We recommend to use the Segmentation plugin of the MITK Workbench to create such label set preset files as described in its F1 user guide.
  • Segmentation (file path): A pre-segmentation that a user can start with or has to refine for example.
  • Dynamic (boolean): In case Image refers to a dynamic (3d+t) image, specifies whether the segmentation should be static (false), i.e. equal for all time steps, or dynamic (true), i.e. individual for each time step.
Task defaults / common properties

If a task list contains multiple tasks with common properties, they do not have to be specified for each and every task again and again. Instead, the root object can contain an optional Defaults object that is identical in format to the tasks specified above. As the name indicates, default properties can still be overridden by individual tasks if they are specified explicitly.

There is one exception: A Defaults object must not contain a Result file path, since result files of tasks must be distinct by definition.
Example

The following example is a complete showcase of the properties and features listed above. It specifies 4 tasks. 3 tasks refer to the same patient image so it is specified as default.

Remember that the only task property required to be distinct is Result so you are pretty free in your task design. For simplicity, we chose to define tasks around organs for this example and named the tasks accordingly:

Example.json
{
  "FileFormat": "MITK Segmentation Task List",
  "Version": 1,
  "Name": "Example Segmentation Task List",
  "Defaults": {
    "Image": "images/Pic3D.nrrd"
  },
  "Tasks": [
    {
      "Name": "Liver",
      "LabelName": "Liver",
      "LabelNameSuggestions": "suggestions/label_suggestions.json",
      "Description": "This task provides an image and label name suggestions for new labels. The segmentation will start with an empty label named Liver.",
      "Result": "results/liver.nrrd"
    },
    {
      "Name": "Kidneys",
      "Description": "This task provides an image and a label set preset that is applied to the new segmentation.",
      "Preset": "presets/kidneys.lsetp",
      "Result": "results/kidneys.nrrd"
    },
    {
      "Name": "Spleen",
      "Description": "This task provides an image and an initial (pre-)segmentation.",
      "Segmentation": "segmentations/spleen.nrrd",
      "Result": "results/spleen.nrrd"
    },
    {
      "Name": "Surprise",
      "Description": "And now for something completely different. This task overrides the default Image and starts with an empty static segmentation for a dynamic image.",
      "Image": "images/US4DCyl.nrrd",
      "Result": "results/US4DCyl.nrrd",
      "Dynamic": false
    }
  ]
}
What's next?

One of the next iterations of the file format will allow to specify pre-defined rules that are automatically checked by the application to decide whether a segmentation is considered valid/complete or not. For example, such rules could define that all pixels of an image must be labelled or that a segmentation must contain certain labels.

The Label Suggestions JSON file format mentioned above to specify a list of suggested names and optional colors for new labels is as follows:

Label Suggestions Example.json
[
  {
    "name": "Abdomen",
    "color": "red"
  },
  {
    "name": "Lung",
    "color": "#00ff00"
  },
  {
    "name": "Heart"
  },
  {
    "name": "Aortic Valve",
    "color": "CornflowerBlue"
  }
]
This comment was removed by kislinsk.
kislinsk claimed this task.

Hi spectators! 👋

I heard that the MITK Segmentation Task List file format received some attention at the NAMIC project week.

We are happy for any feedback and ideas.

We consider the file format to be still in an early, kind of experimental yet internally successful stage, as it served us pretty well already. We plan to make the GUI on top of it available in the MITK Workbench starting with our upcoming MITK release v2023.04. For now, you find it only in the Mitk Flowbench application (used by Kaapana for example). Please note that it is a versioned file format and it has a high potential to be improved and extended in particular regarding various DICOM-related topics.

Did you consider using JSON Schema to formalize the definition? It is a bit hard to grasp it right now, but admittedly I probably didn't spend enough time looking over the various resources provided.

Did you consider using JSON Schema to formalize the definition? It is a bit hard to grasp it right now, but admittedly I probably didn't spend enough time looking over the various resources provided.

Writing a user guide and a file format specification in the documentation is up next indeed. I may consider writing a schema for that but I have to get familiar with it first. Until then, please ignore everything except this comment which completely describes the current format by example: https://phabricator.mitk.org/T29160#239621

Thanks. Here are some more comments to start the discussion:

  • l think it will be important to make it possible to work with the inputs defined by DICOM UIDs and available through DICOM endpoints, instead of relying purely on file system. Would this be acceptable for your use cases?
  • In addition to segmentations, it would be helpful to include an option to perform the review of existing segmentations (e.g., those generated by AI that need to be QA'd by an expert, and where some kind of segmentation may or may not be generated as part of the QA process)
  • I think there is potentially common needs between this task and the "MetaDashboard" initiative in kaapana (see https://projectweek.na-mic.org/PW38_2023_GranCanaria/Projects/MetaDashboard/) - should these two efforts be coordinated? In that regard, the definition of the worklist does not need to be limited to support of the segmentation task (e.g., there are other tasks already considered in the context of the MetaDashboard, such as series type classification, presence of the artifacts.
  • l think it will be important to make it possible to work with the inputs defined by DICOM UIDs and available through DICOM endpoints, instead of relying purely on file system. Would this be acceptable for your use cases?

We focused on local file paths so far indeed, driven by our requirements and Kaapana’s way of providing general resource access through local files. URIs are not off the table, though, as long as complexity lies within in the intentional KISS spirit of the file format.

  • In addition to segmentations, it would be helpful to include an option to perform the review of existing segmentations (e.g., those generated by AI that need to be QA'd by an expert, and where some kind of segmentation may or may not be generated as part of the QA process)

That's already a crucial feature of the existing file format and it is used like this by us. See the "Spleen" task in the example above.

  • I think there is potentially common needs between this task and the "MetaDashboard" initiative in kaapana (see https://projectweek.na-mic.org/PW38_2023_GranCanaria/Projects/MetaDashboard/) - should these two efforts be coordinated? In that regard, the definition of the worklist does not need to be limited to support of the segmentation task (e.g., there are other tasks already considered in the context of the MetaDashboard, such as series type classification, presence of the artifacts.

I'll ask @gaoh about MetaDashboard on Wednesday. It wasn't mentioned so far. Regarding the generalization of task types we agree but decided to stick with segmentations for now as driven by internal demands. For example there's potential for registration tasks on the horizon but we haven't decided how we would implement this as the simple format already sparks quite some complexity.

Here's a short list of the next major features driven by internal demands:

  • Rules: Tasks can have predefined rules like "This label must not be empty" or "All pixels of the image must be segmented" and these rules will be checked when confirming a task and warnings/errors will be spit out accordingly. The feature is scheduled for the upcoming release end of April/beginning of May. Details: T29225: Rule Checker for FlowBench
  • Detection: Tasks can have a set of bounding boxes to specify foci for segmentation. This is driven by multiple internal requests, one of them being related to nnDetection (GitHub). It is scheduled as the first task list feature to be implemented after the upcoming release.
  • l think it will be important to make it possible to work with the inputs defined by DICOM UIDs and available through DICOM endpoints, instead of relying purely on file system. Would this be acceptable for your use cases?

We focused on local file paths so far indeed, driven by our requirements and Kaapana’s way of providing general resource access through local files. URIs are not off the table, though, as long as complexity lies within in the intentional KISS spirit of the file format.

I agree (to both ;). If we extend it (which makes mid term also sense for Kaapana) I would go for URIs and avoid any custom specification for different datasources and protocols. I think it is the minimal invasive and well standardized. DICOM could be covered the via DICOMWeb requests.

  • I think there is potentially common needs between this task and the "MetaDashboard" initiative in kaapana (see https://projectweek.na-mic.org/PW38_2023_GranCanaria/Projects/MetaDashboard/) - should these two efforts be coordinated? In that regard, the definition of the worklist does not need to be limited to support of the segmentation task (e.g., there are other tasks already considered in the context of the MetaDashboard, such as series type classification, presence of the artifacts.

I'll ask @gaoh about MetaDashboard on Wednesday. It wasn't mentioned so far. Regarding the generalization of task types we agree but decided to stick with segmentations for now as driven by internal demands. For example there's potential for registration tasks on the horizon but we haven't decided how we would implement this as the simple format already sparks quite some complexity.

Yes, I think to find a pareto optimum between "easy to understand and targeted" and "general applicability/reusability" is a tricky one. As much as I love deep dives into abstracts and mighty designs. Currently we try to be as pragmatic as possible and first learn what works and what not and in doubt break things and introduce a new version.
There are surely are multiple other tasks types that could exploit similarity. But right now I am not sure, if we really try to cover in one file format or may be end up in a "family" of formats that regard common principles (in order to reuse code base where appropriate) but have "rights on there own.
E.g. sure, one could just introduce a property per task element indicating its type. But that would allow heterogeneous task lists. On one side a nice feature (but may be quite academic), but downside would be (at least): (a) business logic that supports that becomes for more complex compared to single task type support; (b) the CSS-like "default" feature would have problems / becoming way more complex. And as we want to keep the format for now at a level that could also be written "by hand" by a human and slim I would not like to drop (b).

@kislinsk @floca thank you for the feedback!

The problem with KISS spirit is that what is KISS for one may not be considered such by someone else.

I understand that for you non-DICOM local files might be a generally applicable/reusable and pragmatic choice. For me, managing transformations and linkages between those orphan files and DICOM images is too much busy work, and I would like to explore the ability to interact directly with the DICOM resources via the standard DICOMweb interface for managing images, segmentations and other things in one place.

I think at this point it probably makes sense for us to explore some dynamic JSON representation for defining the task that is geared towards using DICOM, and if/when it makes sense, we can sync up to exchange thoughts and consider harmonization. The statement of the problem is very dynamic, and it is probably too early to do such coordination now.

I think at this point it probably makes sense for us to explore some dynamic JSON representation for defining the task that is geared towards using DICOM, and if/when it makes sense, we can sync up to exchange thoughts and consider harmonization. The statement of the problem is very dynamic, and it is probably too early to do such coordination now.

Sure. Further discussion and thoughts of consolidation are welcome (and might be easier as soon as everybody (us including) knows their requirements and scope well enough.

I understand that for you non-DICOM local files might be a generally applicable/reusable and pragmatic choice. For me, managing transformations and linkages between those orphan files and DICOM images is too much busy work, and I would like to explore the ability to interact directly with the DICOM resources via the standard DICOMweb interface for managing images, segmentations and other things in one place.

@fedorov There I hope, you can help me out. Why is the proposal to use URI (where we currently only used file path directly) not enough in your case? In my understanding supporting URIs should be enough to incooperated the support of DICOMWeb. But maybe I am missing something. If this is the case, it would be great to know what I miss. Thanks.

@floca I think the issue is that it will probably be better to separately communicate server endpoint and refer to the Study/Series items in the DICOM model hierarchy via UIDs. I think adding support for DICOM should be done as part of the initial design, rather than and afterthought and replacement of file paths by URIs. But that only makes sense if DICOM is important and primary for the use case driving the development. That is why I suggest you guys proceed with your original plan, and we will try to experiment with the DICOM-centric design. But of course I may be wrong. I just think it is more expedient to experiment rather than coordinate the design at the same time as the use cases and requirements are evolving.