Status: Currently we use the old multithread filter infrastructure of itk 4.x. This is inefficient when we fit on an image and using a mask, because most of the threads will idle arround sooner then later because, the have nothing to do (no mask). This is due to the fact that the whole image is tiled and distributed. When a mask is given it makes more sense to just tile the region that is realy covered by the mask. The rest of the image can be just with a default N/A values.
This should be tackeled as soon we have migrated to itk 5 (T27437). Then we should also check, how we could use the now threading functionalities because they are not necessarily bound to the old region tiling schema anymore.