Page MenuHomePhabricator

Machine-learning tractography fails on large datasets
Closed, WontfixPublic

Description

As reported by Marc-Andre, the MitkDFTraining MiniApp cannot handle the 200k tractogram with the ISMRM Challenge Dataset. Until now, following observations were done:

  1. The ML-Training is very memory-consuming and there is no information about the (at least estimated) memory requirements of some instance; on some systems at least the user is informed that the application was killed because the system ran into out-of-memory
  1. Even on big (cluster) systems with almost unlimited resources, the ML-Training stops processing without any meaningful message. The reported crash happened 10 minutes after the training was started.

    MitkDFTraining[8954]: segfault at 2b8c5a25d010 ip 00000000004f9179 sp 00002b985eecd9d0 error 4 in MitkDFTraining[400000+188000]

    which seems (after applying the addr2line tool ) to point to a function in Vigra library
_ZN5vigra6detail12contains_nanILj2EdNS_15StridedArrayTagEEEbRKNS_14MultiArrayViewIXT_ET0_T1_EE 
/home/coteharn/scratch/MITK-superbuild/ep/include/vigra/random_forest/rf_preprocessing.hxx:130

Event Timeline

ad 2. There is no obvious source of possible memory-access-errors, the filter runs successfully with lower number of tracts ( 120k ). Trying to figure out possible reasons with a valgrind-memcheck run (on a smaller test-problem).

Deleted branch T19924-MLTraining-CrashOnLargeDatasets.

Deleted branch T19924-MLTraining-CrashOnLargeDatasets-debugging.