Page MenuHomePhabricator

[Statistics] Median is approximated and completely off if sample size is small
Open, HighPublic

Description

A client noticed that in the case of an image with a tiny mask of two pixels, the median is not equal to the mean value which would be as expected. Instead, it is completely off. For example, in the case of two pixel values 152 and 78, the mean is 115 but the median approximation is 78.37 instead of close to 115.

The calculation can be found in mitkHistogramStatisticsCalculator.cpp.

Switching to an exact calculation in all cases could be rather demanding as it would require a lot of time and memory for the sorting of all pixels. However, we should probably introduce a threshold for smaller cases for which the calculation is exact and only switch to the approximation if there are enough samples. We have to figure out what a good threshold would be and we should also indicate with a flag and in the GUI if the median is exact or an approximation.

Event Timeline

kislinsk triaged this task as Unbreak Now! priority.Aug 2 2023, 4:33 PM
kislinsk created this task.
kislinsk moved this task from Backlog to MITK Meeting on the Request for Discussion board.

I had a look as well: two voxels is really an edge case, given the current method it would only give a correct result for number of bins == 1 . But setting the number of bins to at most (number of voxels - 1 ) doesn't really help, it fails e.g. for three voxels. In general thinking a bit about it I came to the conclusion that limiting the number of bins (currently it's _minimum_ 10) would still not yield a correct median.

However: regarding the client I would state that the _exact_ median in these cases is of no value . So you could clarify if the client just reported a finding which was surprising ( but it's not if you say these are statistical approximations) or if there is a real use-case. In case of the former, maybe a workbench warning would be sufficient if the total number of voxels is very low, explaining the statistical approach.

kislinsk lowered the priority of this task from Unbreak Now! to High.Oct 11 2023, 11:20 AM

Discussion result: Should be fixed with an exact calculation instead of an approximation based on histogram.