Maniphest T19924

Machine-learning tractography fails on large datasets
Closed, WontfixPublic
Actions

Assigned To

Authored By

	• hering
	Sep 2 2016, 3:12 PM

Description

As reported by Marc-Andre, the MitkDFTraining MiniApp cannot handle the 200k tractogram with the ISMRM Challenge Dataset. Until now, following observations were done:

The ML-Training is very memory-consuming and there is no information about the (at least estimated) memory requirements of some instance; on some systems at least the user is informed that the application was killed because the system ran into out-of-memory

Even on big (cluster) systems with almost unlimited resources, the ML-Training stops processing without any meaningful message. The reported crash happened 10 minutes after the training was started.

MitkDFTraining[8954]: segfault at 2b8c5a25d010 ip 00000000004f9179 sp 00002b985eecd9d0 error 4 in MitkDFTraining[400000+188000]

which seems (after applying the addr2line tool ) to point to a function in Vigra library

_ZN5vigra6detail12contains_nanILj2EdNS_15StridedArrayTagEEEbRKNS_14MultiArrayViewIXT_ET0_T1_EE 
/home/coteharn/scratch/MITK-superbuild/ep/include/vigra/random_forest/rf_preprocessing.hxx:130

Related Objects

Mentioned In: T27033: Clean up stale remote branches
rMITK0d59ab77ce0c: Merge branch 'T19924-MLTraining-CrashOnLargeDatasets' into T19924-MLTraining…

Event Timeline

• hering created this task.Sep 2 2016, 3:12 PM

• hering updated the task description. (Show Details)Sep 2 2016, 3:20 PM

ad 2. There is no obvious source of possible memory-access-errors, the filter runs successfully with lower number of tracts ( 120k ). Trying to figure out possible reasons with a valgrind-memcheck run (on a smaller test-problem).

Pushed new branch T19924-MLTraining-CrashOnLargeDatasets.

Pushed new branch T19924-MLTraining-CrashOnLargeDatasets-debugging.

• hering mentioned this in rMITK0d59ab77ce0c: Merge branch 'T19924-MLTraining-CrashOnLargeDatasets' into T19924-MLTraining….Sep 4 2016, 5:49 PM

kislinsk mentioned this in T27033: Clean up stale remote branches.Jan 25 2020, 9:01 PM

Deleted branch T19924-MLTraining-CrashOnLargeDatasets.

Deleted branch T19924-MLTraining-CrashOnLargeDatasets-debugging.

kislinsk closed this task as Wontfix.Sep 10 2020, 10:32 AM

Machine-learning tractography fails on large datasetsClosed, WontfixPublicActions

Description

Related Objects

Event Timeline

Machine-learning tractography fails on large datasets
Closed, WontfixPublic
Actions