diff --git a/README.md b/README.md
index b4e35a0..b14f33e 100644
--- a/README.md
+++ b/README.md
@@ -1,127 +1,128 @@
[](https://join.slack.com/t/mdtoolkit/shared_invite/enQtNTQ3MjY2MzE0MDg2LWNjY2I2Njc5MTY0NmM0ZWIxNmQwZDRhYzk2MDdhM2QxYjliYTcwYzhkNTAxYmRkMDA0MjcyNDMyYjllNTZhY2M)
Copyright © German Cancer Research Center (DKFZ), Division of Medical Image Computing (MIC). Please make sure that your usage of this code is in compliance with the code license.
## Overview
This is a comprehensive framework for object detection featuring:
- 2D + 3D implementations of prevalent object detectors: e.g. Mask R-CNN [1], Retina Net [2], Retina U-Net [3].
- Modular and light-weight structure ensuring sharing of all processing steps (incl. backbone architecture) for comparability of models.
- training with bounding box and/or pixel-wise annotations.
- dynamic patching and tiling of 2D + 3D images (for training and inference).
- weighted consolidation of box predictions across patch-overlaps, ensembles, and dimensions [3].
- monitoring + evaluation simultaneously on object and patient level.
- 2D + 3D output visualizations.
- integration of COCO mean average precision metric [5].
- integration of MIC-DKFZ batch generators for extensive data augmentation [6].
- easy modification to evaluation of instance segmentation and/or semantic segmentation.
[1] He, Kaiming, et al. "Mask R-CNN" ICCV, 2017
[2] Lin, Tsung-Yi, et al. "Focal Loss for Dense Object Detection" TPAMI, 2018.
[3] Jaeger, Paul et al. "Retina U-Net: Embarrassingly Simple Exploitation
of Segmentation Supervision for Medical Object Detection" , 2018
[5] https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py
[6] https://github.com/MIC-DKFZ/batchgenerators
## How to cite this code
Please cite the original publication [3].
## Installation
Setup package in virtual environment
```
git clone https://github.com/MIC-DKFZ/medicaldetectiontoolkit.git.
cd medicaldetectiontoolkit
virtualenv -p python3.7 mdt
source mdt/bin/activate
python setup.py install
```
This framework uses two custom mixed C++/CUDA extensions: Non-maximum suppression (NMS) and RoIAlign. Both are adapted from the original pytorch extensions (under torchvision.ops.boxes and ops.roialign).
-The extensions are automatically compiled from the provided source files under RegRCNN/custom_extensions with above setup.py.
+The extensions are automatically compiled from the provided source files under RegRCNN/custom_extensions with above setup.py.
+Your system is required to have a compatible CUDA compiler (nvcc).
Note: If you'd like to import the raw extensions (not the wrapper modules), be sure to import torch first.
Please note, if you attempt to install the framework via pip, you need to:
1. instead of executing above line `python setup.py install` execute `pip install .`,
2. manually install the custom extensions. This can be done from source by
```
pip install ./custom_extensions/nms
pip install ./custom_extensions/roi_align/2D
pip install ./custom_extensions/roi_align/3D
```
## Prepare the Data
This framework is meant for you to be able to train models on your own data sets.
Two example data loaders are provided in medicaldetectiontoolkit/experiments including thorough documentation to ensure a quick start for your own project. The way I load Data is to have a preprocessing script, which after preprocessing saves the Data of whatever data type into numpy arrays (this is just run once). During training / testing, the data loader then loads these numpy arrays dynamically. (Please note the Data Input side is meant to be customized by you according to your own needs and the provided Data loaders are merely examples: LIDC has a powerful Dataloader that handles 2D/3D inputs and is optimized for patch-based training and inference. Toy-Experiments have a lightweight Dataloader, only handling 2D without patching. The latter makes sense if you want to get familiar with the framework.).
## Execute
1. Set I/O paths, model and training specifics in the configs file: medicaldetectiontoolkit/experiments/your_experiment/configs.py
2. Train the model:
```
python exec.py --mode train --exp_source experiments/my_experiment --exp_dir path/to/experiment/directory
```
This copies snapshots of configs and model to the specified exp_dir, where all outputs will be saved. By default, the data is split into 60% training and 20% validation and 20% testing data to perform a 5-fold cross validation (can be changed to hold-out test set in configs) and all folds will be trained iteratively. In order to train a single fold, specify it using the folds arg:
```
python exec.py --folds 0 1 2 .... # specify any combination of folds [0-4]
```
3. Run inference:
```
python exec.py --mode test --exp_dir path/to/experiment/directory
```
This runs the prediction pipeline and saves all results to exp_dir.
## Models
This framework features all models explored in [3] (implemented in 2D + 3D): The proposed Retina U-Net, a simple but effective Architecture fusing state-of-the-art semantic segmentation with object detection,
also implementations of prevalent object detectors, such as Mask R-CNN, Faster R-CNN+ (Faster R-CNN w\ RoIAlign), Retina Net, U-Faster R-CNN+ (the two stage counterpart of Retina U-Net: Faster R-CNN with auxiliary semantic segmentation), DetU-Net (a U-Net like segmentation architecture with heuristics for object detection.)
## Training annotations
This framework features training with pixelwise and/or bounding box annotations. To overcome the issue of box coordinates in
data augmentation, we feed the annotation masks through data augmentation (create a pseudo mask, if only bounding box annotations provided) and draw the boxes afterwards.
The framework further handles two types of pixel-wise annotations:
1. A label map with individual ROIs identified by increasing label values, accompanied by a vector containing in each position the class target for the lesion with the corresponding label (for this mode set get_rois_from_seg_flag = False when calling ConvertSegToBoundingBoxCoordinates in your Data Loader).
2. A binary label map. There is only one foreground class and single lesions are not identified. All lesions have the same class target (foreground). In this case the Dataloader runs a Connected Component Labelling algorithm to create processable lesion - class target pairs on the fly (for this mode set get_rois_from_seg_flag = True when calling ConvertSegToBoundingBoxCoordinates in your Data Loader).
## Prediction pipeline
This framework provides an inference module, which automatically handles patching of inputs, and tiling, ensembling, and weighted consolidation of output predictions:
## Consolidation of predictions (Weighted Box Clustering)
Multiple predictions of the same image (from test time augmentations, tested epochs and overlapping patches), result in a high amount of boxes (or cubes), which need to be consolidated. In semantic segmentation, the final output would typically be obtained by averaging every pixel over all predictions. As described in [3], **weighted box clustering** (WBC) does this for box predictions:
## Visualization / Monitoring
By default, loss functions and performance metrics are monitored:
Histograms of matched output predictions for training/validation/testing are plotted per foreground class:
Input images + ground truth annotations + output predictions of a sampled validation abtch are plotted after each epoch (here 2D sampled slice with +-3 neighbouring context slices in channels):
Zoomed into the last two lines of the plot:
## License
This framework is published under the [Apache License Version 2.0](LICENSE).
diff --git a/exec.py b/exec.py
index ca5e249..932f9ff 100644
--- a/exec.py
+++ b/exec.py
@@ -1,254 +1,254 @@
#!/usr/bin/env python
# Copyright 2018 Division of Medical Image Computing, German Cancer Research Center (DKFZ).
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""execution script."""
import argparse
import os
import time
import torch
import utils.exp_utils as utils
from evaluator import Evaluator
from predictor import Predictor
from plotting import plot_batch_prediction
def train(logger):
"""
perform the training routine for a given fold. saves plots and selected parameters to the experiment dir
specified in the configs.
"""
logger.info('performing training in {}D over fold {} on experiment {} with model {}'.format(
cf.dim, cf.fold, cf.exp_dir, cf.model))
net = model.net(cf, logger).cuda()
optimizer = torch.optim.Adam(net.parameters(), lr=cf.learning_rate[0], weight_decay=cf.weight_decay)
if cf.dynamic_lr_scheduling:
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode=cf.scheduling_mode, factor=cf.lr_decay_factor,
patience=cf.scheduling_patience)
model_selector = utils.ModelSelector(cf, logger)
train_evaluator = Evaluator(cf, logger, mode='train')
val_evaluator = Evaluator(cf, logger, mode=cf.val_mode)
starting_epoch = 1
# prepare monitoring
monitor_metrics = utils.prepare_monitoring(cf)
if cf.resume_to_checkpoint:
starting_epoch, monitor_metrics = utils.load_checkpoint(cf.resume_to_checkpoint, net, optimizer)
logger.info('resumed to checkpoint {} at epoch {}'.format(cf.resume_to_checkpoint, starting_epoch))
logger.info('loading dataset and initializing batch generators...')
batch_gen = data_loader.get_train_generators(cf, logger)
for epoch in range(starting_epoch, cf.num_epochs + 1):
logger.info('starting training epoch {}'.format(epoch))
start_time = time.time()
net.train()
train_results_list = []
for bix in range(cf.num_train_batches):
batch = next(batch_gen['train'])
tic_fw = time.time()
results_dict = net.train_forward(batch)
tic_bw = time.time()
optimizer.zero_grad()
results_dict['torch_loss'].backward()
optimizer.step()
logger.info('tr. batch {0}/{1} (ep. {2}) fw {3:.2f}s / bw {4:.2f} s / total {5:.2f} s || '
.format(bix + 1, cf.num_train_batches, epoch, tic_bw - tic_fw,
time.time() - tic_bw, time.time() - tic_fw) + results_dict['logger_string'])
train_results_list.append([results_dict['boxes'], batch['pid']])
_, monitor_metrics['train'] = train_evaluator.evaluate_predictions(train_results_list, monitor_metrics['train'])
#import IPython; IPython.embed()
train_time = time.time() - start_time
logger.info('starting validation in mode {}.'.format(cf.val_mode))
with torch.no_grad():
net.eval()
if cf.do_validation:
val_results_list = []
val_predictor = Predictor(cf, net, logger, mode='val')
for _ in range(batch_gen['n_val']):
batch = next(batch_gen[cf.val_mode])
if cf.val_mode == 'val_patient':
results_dict = val_predictor.predict_patient(batch)
elif cf.val_mode == 'val_sampling':
results_dict = net.train_forward(batch, is_validation=True)
val_results_list.append([results_dict['boxes'], batch['pid']])
_, monitor_metrics['val'] = val_evaluator.evaluate_predictions(val_results_list, monitor_metrics['val'])
model_selector.run_model_selection(net, optimizer, monitor_metrics, epoch)
# update monitoring and prediction plots
monitor_metrics.update({"lr":
{str(g): group['lr'] for (g, group) in enumerate(optimizer.param_groups)}})
logger.metrics2tboard(monitor_metrics, global_step=epoch)
epoch_time = time.time() - start_time
logger.info('trained epoch {}: took {:.2f} s ({:.2f} s train / {:.2f} s val)'.format(
epoch, epoch_time, train_time, epoch_time-train_time))
batch = next(batch_gen['val_sampling'])
results_dict = net.train_forward(batch, is_validation=True)
logger.info('plotting predictions from validation sampling.')
plot_batch_prediction(batch, results_dict, cf)
# -------------- scheduling -----------------
if cf.dynamic_lr_scheduling:
scheduler.step(monitor_metrics["val"][cf.scheduling_criterion][-1])
else:
for param_group in optimizer.param_groups:
param_group['lr'] = cf.learning_rate[epoch-1]
def test(logger):
"""
perform testing for a given fold (or hold out set). save stats in evaluator.
"""
logger.info('starting testing model of fold {} in exp {}'.format(cf.fold, cf.exp_dir))
net = model.net(cf, logger).cuda()
test_predictor = Predictor(cf, net, logger, mode='test')
test_evaluator = Evaluator(cf, logger, mode='test')
batch_gen = data_loader.get_test_generator(cf, logger)
test_results_list = test_predictor.predict_test_set(batch_gen, return_results=True)
test_evaluator.evaluate_predictions(test_results_list)
test_evaluator.score_test_df()
if __name__ == '__main__':
stime = time.time()
parser = argparse.ArgumentParser()
parser.add_argument('-m', '--mode', type=str, default='train_test',
help='one out of: train / test / train_test / analysis / create_exp')
parser.add_argument('-f','--folds', nargs='+', type=int, default=None,
help='None runs over all folds in CV. otherwise specify list of folds.')
parser.add_argument('--exp_dir', type=str, default='/path/to/experiment/directory',
help='path to experiment dir. will be created if non existent.')
parser.add_argument('--server_env', default=False, action='store_true',
help='change IO settings to deploy models on a cluster.')
- parser.add_argument('--slurm_job_id', type=str, default=None, help='job scheduler info')
+ parser.add_argument('--data_dest', type=str, default=None, help="path to final data folder if different from config.")
parser.add_argument('--use_stored_settings', default=False, action='store_true',
help='load configs from existing exp_dir instead of source dir. always done for testing, '
'but can be set to true to do the same for training. useful in job scheduler environment, '
'where source code might change before the job actually runs.')
parser.add_argument('--resume_to_checkpoint', type=str, default=None,
help='if resuming to checkpoint, the desired fold still needs to be parsed via --folds.')
parser.add_argument('--exp_source', type=str, default='experiments/toy_exp',
help='specifies, from which source experiment to load configs and data_loader.')
parser.add_argument('-d', '--dev', default=False, action='store_true', help="development mode: shorten everything")
args = parser.parse_args()
folds = args.folds
resume_to_checkpoint = args.resume_to_checkpoint
if args.mode == 'train' or args.mode == 'train_test':
cf = utils.prep_exp(args.exp_source, args.exp_dir, args.server_env, args.use_stored_settings)
if args.dev:
folds = [0,1]
cf.batch_size, cf.num_epochs, cf.min_save_thresh, cf.save_n_models = 3 if cf.dim==2 else 1, 1, 0, 1
cf.num_train_batches, cf.num_val_batches, cf.max_val_patients = 5, 1, 1
cf.test_n_epochs = cf.save_n_models
cf.max_test_patients = 1
- cf.slurm_job_id = args.slurm_job_id
+ cf.data_dest = args.data_dest
logger = utils.get_logger(cf.exp_dir, cf.server_env)
data_loader = utils.import_module('dl', os.path.join(args.exp_source, 'data_loader.py'))
model = utils.import_module('model', cf.model_path)
logger.info("loaded model from {}".format(cf.model_path))
if folds is None:
folds = range(cf.n_cv_splits)
for fold in folds:
cf.fold_dir = os.path.join(cf.exp_dir, 'fold_{}'.format(fold))
cf.fold = fold
cf.resume_to_checkpoint = resume_to_checkpoint
if not os.path.exists(cf.fold_dir):
os.mkdir(cf.fold_dir)
logger.set_logfile(fold=fold)
train(logger)
cf.resume_to_checkpoint = None
if args.mode == 'train_test':
test(logger)
elif args.mode == 'test':
cf = utils.prep_exp(args.exp_source, args.exp_dir, args.server_env, is_training=False, use_stored_settings=True)
if args.dev:
folds = [0,1]
cf.test_n_epochs = 1; cf.max_test_patients = 1
- cf.slurm_job_id = args.slurm_job_id
+ cf.data_dest = args.data_dest
logger = utils.get_logger(cf.exp_dir, cf.server_env)
data_loader = utils.import_module('dl', os.path.join(args.exp_source, 'data_loader.py'))
model = utils.import_module('model', cf.model_path)
logger.info("loaded model from {}".format(cf.model_path))
if folds is None:
folds = range(cf.n_cv_splits)
for fold in folds:
cf.fold_dir = os.path.join(cf.exp_dir, 'fold_{}'.format(fold))
cf.fold = fold
logger.set_logfile(fold=fold)
test(logger)
# load raw predictions saved by predictor during testing, run aggregation algorithms and evaluation.
elif args.mode == 'analysis':
cf = utils.prep_exp(args.exp_source, args.exp_dir, args.server_env, is_training=False, use_stored_settings=True)
logger = utils.get_logger(cf.exp_dir, cf.server_env)
if cf.hold_out_test_set:
cf.folds = args.folds
predictor = Predictor(cf, net=None, logger=logger, mode='analysis')
results_list = predictor.load_saved_predictions(apply_wbc=True)
utils.create_csv_output(results_list, cf, logger)
else:
if folds is None:
folds = range(cf.n_cv_splits)
for fold in folds:
cf.fold_dir = os.path.join(cf.exp_dir, 'fold_{}'.format(fold))
cf.fold = fold
logger.set_logfile(fold=fold)
predictor = Predictor(cf, net=None, logger=logger, mode='analysis')
results_list = predictor.load_saved_predictions(apply_wbc=True)
logger.info('starting evaluation...')
evaluator = Evaluator(cf, logger, mode='test')
evaluator.evaluate_predictions(results_list)
evaluator.score_test_df()
# create experiment folder and copy scripts without starting job.
# useful for cloud deployment where configs might change before job actually runs.
elif args.mode == 'create_exp':
- cf = utils.prep_exp(args.exp_source, args.exp_dir, args.server_env, use_stored_settings=True)
+ cf = utils.prep_exp(args.exp_source, args.exp_dir, args.server_env, use_stored_settings=False)
logger = utils.get_logger(cf.exp_dir)
- logger.info('created experiment directory at {}'.format(args.exp_dir))
+ logger.info('created experiment directory at {}'.format(cf.exp_dir))
else:
raise RuntimeError('mode specified in args is not implemented...')
mins, secs = divmod((time.time() - stime), 60)
h, mins = divmod(mins, 60)
t = "{:d}h:{:02d}m:{:02d}s".format(int(h), int(mins), int(secs))
logger.info("{} total runtime: {}".format(os.path.split(__file__)[1], t))
del logger
\ No newline at end of file
diff --git a/experiments/lidc_exp/data_loader.py b/experiments/lidc_exp/data_loader.py
index 0c8de96..87aad82 100644
--- a/experiments/lidc_exp/data_loader.py
+++ b/experiments/lidc_exp/data_loader.py
@@ -1,485 +1,485 @@
#!/usr/bin/env python
# Copyright 2018 Division of Medical Image Computing, German Cancer Research Center (DKFZ).
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
'''
Example Data Loader for the LIDC data set. This dataloader expects preprocessed data in .npy or .npz files per patient and
a pandas dataframe in the same directory containing the meta-info e.g. file paths, labels, foregound slice-ids.
'''
import numpy as np
import os
from collections import OrderedDict
import pandas as pd
import pickle
import time
import subprocess
import utils.dataloader_utils as dutils
# batch generator tools from https://github.com/MIC-DKFZ/batchgenerators
from batchgenerators.dataloading.data_loader import SlimDataLoaderBase
from batchgenerators.transforms.spatial_transforms import MirrorTransform as Mirror
from batchgenerators.transforms.abstract_transforms import Compose
from batchgenerators.dataloading.multi_threaded_augmenter import MultiThreadedAugmenter
from batchgenerators.dataloading import SingleThreadedAugmenter
from batchgenerators.transforms.spatial_transforms import SpatialTransform
from batchgenerators.transforms.crop_and_pad_transforms import CenterCropTransform
from batchgenerators.transforms.utility_transforms import ConvertSegToBoundingBoxCoordinates
def get_train_generators(cf, logger):
"""
wrapper function for creating the training batch generator pipeline. returns the train/val generators.
selects patients according to cv folds (generated by first run/fold of experiment):
splits the data into n-folds, where 1 split is used for val, 1 split for testing and the rest for training. (inner loop test set)
If cf.hold_out_test_set is True, adds the test split to the training data.
"""
all_data = load_dataset(cf, logger)
all_pids_list = np.unique([v['pid'] for (k, v) in all_data.items()])
if not cf.created_fold_id_pickle:
fg = dutils.fold_generator(seed=cf.seed, n_splits=cf.n_cv_splits, len_data=len(all_pids_list)).get_fold_names()
with open(os.path.join(cf.exp_dir, 'fold_ids.pickle'), 'wb') as handle:
pickle.dump(fg, handle)
cf.created_fold_id_pickle = True
else:
with open(os.path.join(cf.exp_dir, 'fold_ids.pickle'), 'rb') as handle:
fg = pickle.load(handle)
train_ix, val_ix, test_ix, _ = fg[cf.fold]
train_pids = [all_pids_list[ix] for ix in train_ix]
val_pids = [all_pids_list[ix] for ix in val_ix]
if cf.hold_out_test_set:
train_pids += [all_pids_list[ix] for ix in test_ix]
train_data = {k: v for (k, v) in all_data.items() if any(p == v['pid'] for p in train_pids)}
val_data = {k: v for (k, v) in all_data.items() if any(p == v['pid'] for p in val_pids)}
logger.info("data set loaded with: {} train / {} val / {} test patients".format(len(train_ix), len(val_ix), len(test_ix)))
batch_gen = {}
batch_gen['train'] = create_data_gen_pipeline(train_data, cf=cf, is_training=True)
batch_gen['val_sampling'] = create_data_gen_pipeline(val_data, cf=cf, is_training=False)
if cf.val_mode == 'val_patient':
batch_gen['val_patient'] = PatientBatchIterator(val_data, cf=cf)
batch_gen['n_val'] = len(val_ix) if cf.max_val_patients is None else min(len(val_ix), cf.max_val_patients)
else:
batch_gen['n_val'] = cf.num_val_batches
return batch_gen
def get_test_generator(cf, logger):
"""
wrapper function for creating the test batch generator pipeline.
selects patients according to cv folds (generated by first run/fold of experiment)
If cf.hold_out_test_set is True, gets the data from an external folder instead.
"""
if cf.hold_out_test_set:
pp_name = cf.pp_test_name
test_ix = None
else:
pp_name = None
with open(os.path.join(cf.exp_dir, 'fold_ids.pickle'), 'rb') as handle:
fold_list = pickle.load(handle)
_, _, test_ix, _ = fold_list[cf.fold]
# warnings.warn('WARNING: using validation set for testing!!!')
test_data = load_dataset(cf, logger, test_ix, pp_data_path=cf.pp_test_data_path, pp_name=pp_name)
logger.info("data set loaded with: {} test patients".format(len(test_ix)))
batch_gen = {}
batch_gen['test'] = PatientBatchIterator(test_data, cf=cf)
batch_gen['n_test'] = len(test_ix) if cf.max_test_patients=="all" else \
min(cf.max_test_patients, len(test_ix))
return batch_gen
def load_dataset(cf, logger, subset_ixs=None, pp_data_path=None, pp_name=None):
"""
loads the dataset. if deployed in cloud also copies and unpacks the data to the working directory.
:param subset_ixs: subset indices to be loaded from the dataset. used e.g. for testing to only load the test folds.
:return: data: dictionary with one entry per patient (in this case per patient-breast, since they are treated as
individual images for training) each entry is a dictionary containing respective meta-info as well as paths to the preprocessed
numpy arrays to be loaded during batch-generation
"""
if pp_data_path is None:
pp_data_path = cf.pp_data_path
if pp_name is None:
pp_name = cf.pp_name
if cf.server_env:
copy_data = True
- target_dir = os.path.join('/ssd', cf.slurm_job_id, pp_name, cf.crop_name)
+ target_dir = os.path.join(cf.data_dest, pp_name, cf.crop_name)
if not os.path.exists(target_dir):
cf.data_source_dir = pp_data_path
os.makedirs(target_dir)
subprocess.call('rsync -av {} {}'.format(
os.path.join(cf.data_source_dir, cf.input_df_name), os.path.join(target_dir, cf.input_df_name)), shell=True)
logger.info('created target dir and info df at {}'.format(os.path.join(target_dir, cf.input_df_name)))
elif subset_ixs is None:
copy_data = False
pp_data_path = target_dir
p_df = pd.read_pickle(os.path.join(pp_data_path, cf.input_df_name))
if cf.select_prototype_subset is not None:
prototype_pids = p_df.pid.tolist()[:cf.select_prototype_subset]
p_df = p_df[p_df.pid.isin(prototype_pids)]
logger.warning('WARNING: using prototyping data subset!!!')
if subset_ixs is not None:
subset_pids = [np.unique(p_df.pid.tolist())[ix] for ix in subset_ixs]
p_df = p_df[p_df.pid.isin(subset_pids)]
logger.info('subset: selected {} instances from df'.format(len(p_df)))
if cf.server_env:
if copy_data:
copy_and_unpack_data(logger, p_df.pid.tolist(), cf.fold_dir, cf.data_source_dir, target_dir)
class_targets = p_df['class_target'].tolist()
pids = p_df.pid.tolist()
imgs = [os.path.join(pp_data_path, '{}_img.npy'.format(pid)) for pid in pids]
segs = [os.path.join(pp_data_path,'{}_rois.npy'.format(pid)) for pid in pids]
data = OrderedDict()
for ix, pid in enumerate(pids):
# for the experiment conducted here, malignancy scores are binarized: (benign: 1-2, malignant: 3-5)
targets = [1 if ii >= 3 else 0 for ii in class_targets[ix]]
data[pid] = {'data': imgs[ix], 'seg': segs[ix], 'pid': pid, 'class_target': targets}
data[pid]['fg_slices'] = p_df['fg_slices'].tolist()[ix]
return data
def create_data_gen_pipeline(patient_data, cf, is_training=True):
"""
create mutli-threaded train/val/test batch generation and augmentation pipeline.
:param patient_data: dictionary containing one dictionary per patient in the train/test subset.
:param is_training: (optional) whether to perform data augmentation (training) or not (validation/testing)
:return: multithreaded_generator
"""
# create instance of batch generator as first element in pipeline.
data_gen = BatchGenerator(patient_data, batch_size=cf.batch_size, cf=cf)
# add transformations to pipeline.
my_transforms = []
if is_training:
mirror_transform = Mirror(axes=np.arange(cf.dim))
my_transforms.append(mirror_transform)
spatial_transform = SpatialTransform(patch_size=cf.patch_size[:cf.dim],
patch_center_dist_from_border=cf.da_kwargs['rand_crop_dist'],
do_elastic_deform=cf.da_kwargs['do_elastic_deform'],
alpha=cf.da_kwargs['alpha'], sigma=cf.da_kwargs['sigma'],
do_rotation=cf.da_kwargs['do_rotation'], angle_x=cf.da_kwargs['angle_x'],
angle_y=cf.da_kwargs['angle_y'], angle_z=cf.da_kwargs['angle_z'],
do_scale=cf.da_kwargs['do_scale'], scale=cf.da_kwargs['scale'],
random_crop=cf.da_kwargs['random_crop'])
my_transforms.append(spatial_transform)
else:
my_transforms.append(CenterCropTransform(crop_size=cf.patch_size[:cf.dim]))
my_transforms.append(ConvertSegToBoundingBoxCoordinates(cf.dim, get_rois_from_seg_flag=False, class_specific_seg_flag=cf.class_specific_seg_flag))
all_transforms = Compose(my_transforms)
# multithreaded_generator = SingleThreadedAugmenter(data_gen, all_transforms)
multithreaded_generator = MultiThreadedAugmenter(data_gen, all_transforms, num_processes=cf.n_workers, seeds=range(cf.n_workers))
return multithreaded_generator
class BatchGenerator(SlimDataLoaderBase):
"""
creates the training/validation batch generator. Samples n_batch_size patients (draws a slice from each patient if 2D)
from the data set while maintaining foreground-class balance. Returned patches are cropped/padded to pre_crop_size.
Actual patch_size is obtained after data augmentation.
:param data: data dictionary as provided by 'load_dataset'.
:param batch_size: number of patients to sample for the batch
:return dictionary containing the batch data (b, c, x, y, (z)) / seg (b, 1, x, y, (z)) / pids / class_target
"""
def __init__(self, data, batch_size, cf):
super(BatchGenerator, self).__init__(data, batch_size)
self.cf = cf
self.crop_margin = np.array(self.cf.patch_size)/8. #min distance of ROI center to edge of cropped_patch.
self.p_fg = 0.5
def generate_train_batch(self):
batch_data, batch_segs, batch_pids, batch_targets, batch_patient_labels = [], [], [], [], []
class_targets_list = [v['class_target'] for (k, v) in self._data.items()]
if self.cf.head_classes > 2:
# samples patients towards equilibrium of foreground classes on a roi-level (after randomly sampling the ratio "batch_sample_slack).
batch_ixs = dutils.get_class_balanced_patients(
class_targets_list, self.batch_size, self.cf.head_classes - 1, slack_factor=self.cf.batch_sample_slack)
else:
batch_ixs = np.random.choice(len(class_targets_list), self.batch_size)
patients = list(self._data.items())
for b in batch_ixs:
patient = patients[b][1]
data = np.transpose(np.load(patient['data'], mmap_mode='r'), axes=(1, 2, 0))[np.newaxis] # (c, y, x, z)
seg = np.transpose(np.load(patient['seg'], mmap_mode='r'), axes=(1, 2, 0))
batch_pids.append(patient['pid'])
batch_targets.append(patient['class_target'])
if self.cf.dim == 2:
# draw random slice from patient while oversampling slices containing foreground objects with p_fg.
if len(patient['fg_slices']) > 0:
fg_prob = self.p_fg / len(patient['fg_slices'])
bg_prob = (1 - self.p_fg) / (data.shape[3] - len(patient['fg_slices']))
slices_prob = [fg_prob if ix in patient['fg_slices'] else bg_prob for ix in range(data.shape[3])]
slice_id = np.random.choice(data.shape[3], p=slices_prob)
else:
slice_id = np.random.choice(data.shape[3])
# if set to not None, add neighbouring slices to each selected slice in channel dimension.
if self.cf.n_3D_context is not None:
padded_data = dutils.pad_nd_image(data[0], [(data.shape[-1] + (self.cf.n_3D_context*2))], mode='constant')
padded_slice_id = slice_id + self.cf.n_3D_context
data = (np.concatenate([padded_data[..., ii][np.newaxis] for ii in range(
padded_slice_id - self.cf.n_3D_context, padded_slice_id + self.cf.n_3D_context + 1)], axis=0))
else:
data = data[..., slice_id]
seg = seg[..., slice_id]
# pad data if smaller than pre_crop_size.
if np.any([data.shape[dim + 1] < ps for dim, ps in enumerate(self.cf.pre_crop_size)]):
new_shape = [np.max([data.shape[dim + 1], ps]) for dim, ps in enumerate(self.cf.pre_crop_size)]
data = dutils.pad_nd_image(data, new_shape, mode='constant')
seg = dutils.pad_nd_image(seg, new_shape, mode='constant')
# crop patches of size pre_crop_size, while sampling patches containing foreground with p_fg.
crop_dims = [dim for dim, ps in enumerate(self.cf.pre_crop_size) if data.shape[dim + 1] > ps]
if len(crop_dims) > 0:
fg_prob_sample = np.random.rand(1)
# with p_fg: sample random pixel from random ROI and shift center by random value.
if fg_prob_sample < self.p_fg and np.sum(seg) > 0:
seg_ixs = np.argwhere(seg == np.random.choice(np.unique(seg)[1:], 1))
roi_anchor_pixel = seg_ixs[np.random.choice(seg_ixs.shape[0], 1)][0]
assert seg[tuple(roi_anchor_pixel)] > 0
# sample the patch center coords. constrained by edges of images - pre_crop_size /2. And by
# distance to the desired ROI < patch_size /2.
# (here final patch size to account for center_crop after data augmentation).
sample_seg_center = {}
for ii in crop_dims:
low = np.max((self.cf.pre_crop_size[ii]//2, roi_anchor_pixel[ii] - (self.cf.patch_size[ii]//2 - self.crop_margin[ii])))
high = np.min((data.shape[ii + 1] - self.cf.pre_crop_size[ii]//2,
roi_anchor_pixel[ii] + (self.cf.patch_size[ii]//2 - self.crop_margin[ii])))
# happens if lesion on the edge of the image. dont care about roi anymore,
# just make sure pre-crop is inside image.
if low >= high:
low = data.shape[ii + 1] // 2 - (data.shape[ii + 1] // 2 - self.cf.pre_crop_size[ii] // 2)
high = data.shape[ii + 1] // 2 + (data.shape[ii + 1] // 2 - self.cf.pre_crop_size[ii] // 2)
sample_seg_center[ii] = np.random.randint(low=low, high=high)
else:
# not guaranteed to be empty. probability of emptiness depends on the data.
sample_seg_center = {ii: np.random.randint(low=self.cf.pre_crop_size[ii]//2,
high=data.shape[ii + 1] - self.cf.pre_crop_size[ii]//2) for ii in crop_dims}
for ii in crop_dims:
min_crop = int(sample_seg_center[ii] - self.cf.pre_crop_size[ii] // 2)
max_crop = int(sample_seg_center[ii] + self.cf.pre_crop_size[ii] // 2)
data = np.take(data, indices=range(min_crop, max_crop), axis=ii + 1)
seg = np.take(seg, indices=range(min_crop, max_crop), axis=ii)
batch_data.append(data)
batch_segs.append(seg[np.newaxis])
data = np.array(batch_data)
seg = np.array(batch_segs).astype(np.uint8)
class_target = np.array(batch_targets)
return {'data': data, 'seg': seg, 'pid': batch_pids, 'class_target': class_target}
class PatientBatchIterator(SlimDataLoaderBase):
"""
creates a test generator that iterates over entire given dataset returning 1 patient per batch.
Can be used for monitoring if cf.val_mode = 'patient_val' for a monitoring closer to actualy evaluation (done in 3D),
if willing to accept speed-loss during training.
:return: out_batch: dictionary containing one patient with batch_size = n_3D_patches in 3D or
batch_size = n_2D_patches in 2D .
"""
def __init__(self, data, cf): #threads in augmenter
super(PatientBatchIterator, self).__init__(data, 0)
self.cf = cf
self.patient_ix = 0
self.dataset_pids = [v['pid'] for (k, v) in data.items()]
self.patch_size = cf.patch_size
if len(self.patch_size) == 2:
self.patch_size = self.patch_size + [1]
def generate_train_batch(self):
pid = self.dataset_pids[self.patient_ix]
patient = self._data[pid]
data = np.transpose(np.load(patient['data'], mmap_mode='r'), axes=(1, 2, 0))[np.newaxis] # (c, y, x, z)
seg = np.transpose(np.load(patient['seg'], mmap_mode='r'), axes=(1, 2, 0))
batch_class_targets = np.array([patient['class_target']])
# pad data if smaller than patch_size seen during training.
if np.any([data.shape[dim + 1] < ps for dim, ps in enumerate(self.patch_size)]):
new_shape = [data.shape[0]] + [np.max([data.shape[dim + 1], self.patch_size[dim]]) for dim, ps in enumerate(self.patch_size)]
data = dutils.pad_nd_image(data, new_shape) # use 'return_slicer' to crop image back to original shape.
seg = dutils.pad_nd_image(seg, new_shape)
# get 3D targets for evaluation, even if network operates in 2D. 2D predictions will be merged to 3D in predictor.
if self.cf.dim == 3 or self.cf.merge_2D_to_3D_preds:
out_data = data[np.newaxis]
out_seg = seg[np.newaxis, np.newaxis]
out_targets = batch_class_targets
batch_3D = {'data': out_data, 'seg': out_seg, 'class_target': out_targets, 'pid': pid}
converter = ConvertSegToBoundingBoxCoordinates(dim=3, get_rois_from_seg_flag=False, class_specific_seg_flag=self.cf.class_specific_seg_flag)
batch_3D = converter(**batch_3D)
batch_3D.update({'patient_bb_target': batch_3D['bb_target'],
'patient_roi_labels': batch_3D['roi_labels'],
'original_img_shape': out_data.shape})
if self.cf.dim == 2:
out_data = np.transpose(data, axes=(3, 0, 1, 2)) # (z, c, x, y )
out_seg = np.transpose(seg, axes=(2, 0, 1))[:, np.newaxis]
out_targets = np.array(np.repeat(batch_class_targets, out_data.shape[0], axis=0))
# if set to not None, add neighbouring slices to each selected slice in channel dimension.
if self.cf.n_3D_context is not None:
slice_range = range(self.cf.n_3D_context, out_data.shape[0] + self.cf.n_3D_context)
out_data = np.pad(out_data, ((self.cf.n_3D_context, self.cf.n_3D_context), (0, 0), (0, 0), (0, 0)), 'constant', constant_values=0)
out_data = np.array(
[np.concatenate([out_data[ii] for ii in range(
slice_id - self.cf.n_3D_context, slice_id + self.cf.n_3D_context + 1)], axis=0) for slice_id in
slice_range])
batch_2D = {'data': out_data, 'seg': out_seg, 'class_target': out_targets, 'pid': pid}
converter = ConvertSegToBoundingBoxCoordinates(dim=2, get_rois_from_seg_flag=False, class_specific_seg_flag=self.cf.class_specific_seg_flag)
batch_2D = converter(**batch_2D)
if self.cf.merge_2D_to_3D_preds:
batch_2D.update({'patient_bb_target': batch_3D['patient_bb_target'],
'patient_roi_labels': batch_3D['patient_roi_labels'],
'original_img_shape': out_data.shape})
else:
batch_2D.update({'patient_bb_target': batch_2D['bb_target'],
'patient_roi_labels': batch_2D['roi_labels'],
'original_img_shape': out_data.shape})
out_batch = batch_3D if self.cf.dim == 3 else batch_2D
patient_batch = out_batch
# crop patient-volume to patches of patch_size used during training. stack patches up in batch dimension.
# in this case, 2D is treated as a special case of 3D with patch_size[z] = 1.
if np.any([data.shape[dim + 1] > self.patch_size[dim] for dim in range(3)]):
patch_crop_coords_list = dutils.get_patch_crop_coords(data[0], self.patch_size)
new_img_batch, new_seg_batch, new_class_targets_batch = [], [], []
for cix, c in enumerate(patch_crop_coords_list):
seg_patch = seg[c[0]:c[1], c[2]: c[3], c[4]:c[5]]
new_seg_batch.append(seg_patch)
# if set to not None, add neighbouring slices to each selected slice in channel dimension.
# correct patch_crop coordinates by added slices of 3D context.
if self.cf.dim == 2 and self.cf.n_3D_context is not None:
tmp_c_5 = c[5] + (self.cf.n_3D_context * 2)
if cix == 0:
data = np.pad(data, ((0, 0), (0, 0), (0, 0), (self.cf.n_3D_context, self.cf.n_3D_context)), 'constant', constant_values=0)
else:
tmp_c_5 = c[5]
new_img_batch.append(data[:, c[0]:c[1], c[2]:c[3], c[4]:tmp_c_5])
data = np.array(new_img_batch) # (n_patches, c, x, y, z)
seg = np.array(new_seg_batch)[:, np.newaxis] # (n_patches, 1, x, y, z)
batch_class_targets = np.repeat(batch_class_targets, len(patch_crop_coords_list), axis=0)
if self.cf.dim == 2:
if self.cf.n_3D_context is not None:
data = np.transpose(data[:, 0], axes=(0, 3, 1, 2))
else:
# all patches have z dimension 1 (slices). discard dimension
data = data[..., 0]
seg = seg[..., 0]
patch_batch = {'data': data, 'seg': seg, 'class_target': batch_class_targets, 'pid': pid}
patch_batch['patch_crop_coords'] = np.array(patch_crop_coords_list)
patch_batch['patient_bb_target'] = patient_batch['patient_bb_target']
patch_batch['patient_roi_labels'] = patient_batch['patient_roi_labels']
patch_batch['original_img_shape'] = patient_batch['original_img_shape']
converter = ConvertSegToBoundingBoxCoordinates(self.cf.dim, get_rois_from_seg_flag=False, class_specific_seg_flag=self.cf.class_specific_seg_flag)
patch_batch = converter(**patch_batch)
out_batch = patch_batch
self.patient_ix += 1
if self.patient_ix == len(self.dataset_pids):
self.patient_ix = 0
return out_batch
def copy_and_unpack_data(logger, pids, fold_dir, source_dir, target_dir):
start_time = time.time()
with open(os.path.join(fold_dir, 'file_list.txt'), 'w') as handle:
for pid in pids:
handle.write('{}_img.npz\n'.format(pid))
handle.write('{}_rois.npz\n'.format(pid))
subprocess.call('rsync -av --files-from {} {} {}'.format(os.path.join(fold_dir, 'file_list.txt'),
source_dir, target_dir), shell=True)
dutils.unpack_dataset(target_dir)
copied_files = os.listdir(target_dir)
logger.info("copying and unpacking data set finsihed : {} files in target dir: {}. took {} sec".format(
len(copied_files), target_dir, np.round(time.time() - start_time, 0)))
if __name__=="__main__":
import utils.exp_utils as utils
from configs import configs
total_stime = time.time()
cf = configs()
cf.created_fold_id_pickle = False
cf.exp_dir = "experiments/dev/"
cf.plot_dir = cf.exp_dir + "plots"
os.makedirs(cf.exp_dir, exist_ok=True)
cf.fold = 0
logger = utils.get_logger(cf.exp_dir)
batch_gen = get_train_generators(cf, logger)
train_batch = next(batch_gen["train"])
mins, secs = divmod((time.time() - total_stime), 60)
h, mins = divmod(mins, 60)
t = "{:d}h:{:02d}m:{:02d}s".format(int(h), int(mins), int(secs))
print("{} total runtime: {}".format(os.path.split(__file__)[1], t))
\ No newline at end of file
diff --git a/experiments/toy_exp/configs.py b/experiments/toy_exp/configs.py
index 00e9b95..6278a11 100644
--- a/experiments/toy_exp/configs.py
+++ b/experiments/toy_exp/configs.py
@@ -1,350 +1,350 @@
#!/usr/bin/env python
# Copyright 2018 Division of Medical Image Computing, German Cancer Research Center (DKFZ).
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import sys
import os
sys.path.append(os.path.dirname(os.path.realpath(__file__)))
import numpy as np
from default_configs import DefaultConfigs
class configs(DefaultConfigs):
def __init__(self, server_env=None):
#########################
# Preprocessing #
#########################
self.root_dir = '/home/gregor/datasets/toy_mdt'
#########################
# I/O #
#########################
# one out of [2, 3]. dimension the model operates in.
self.dim = 2
# one out of ['mrcnn', 'retina_net', 'retina_unet', 'detection_unet', 'ufrcnn'].
self.model = 'detection_unet'
DefaultConfigs.__init__(self, self.model, server_env, self.dim)
# int [0 < dataset_size]. select n patients from dataset for prototyping.
self.select_prototype_subset = None
self.hold_out_test_set = True
self.n_train_data = 2500
# choose one of the 3 toy experiments described in https://arxiv.org/pdf/1811.08661.pdf
# one of ['donuts_shape', 'donuts_pattern', 'circles_scale'].
toy_mode = 'donuts_shape_noise'
# path to preprocessed data.
self.input_df_name = 'info_df.pickle'
self.pp_name = os.path.join(toy_mode, 'train')
self.pp_data_path = os.path.join(self.root_dir, self.pp_name)
self.pp_test_name = os.path.join(toy_mode, 'test')
self.pp_test_data_path = os.path.join(self.root_dir, self.pp_test_name)
# settings for deployment in cloud.
if server_env:
# path to preprocessed data.
- pp_root_dir = '/path/to/data'
+ pp_root_dir = '/datasets/datasets_ramien/toy_mdt'
self.pp_name = os.path.join(toy_mode, 'train')
self.pp_data_path = os.path.join(pp_root_dir, self.pp_name)
self.pp_test_name = os.path.join(toy_mode, 'test')
self.pp_test_data_path = os.path.join(pp_root_dir, self.pp_test_name)
self.select_prototype_subset = None
#########################
# Data Loader #
#########################
# select modalities from preprocessed data
self.channels = [0]
self.n_channels = len(self.channels)
# patch_size to be used for training. pre_crop_size is the patch_size before data augmentation.
self.pre_crop_size_2D = [320, 320]
self.patch_size_2D = [320, 320]
self.patch_size = self.patch_size_2D if self.dim == 2 else self.patch_size_3D
self.pre_crop_size = self.pre_crop_size_2D if self.dim == 2 else self.pre_crop_size_3D
# ratio of free sampled batch elements before class balancing is triggered
# (>0 to include "empty"/background patches.)
self.batch_sample_slack = 0.2
# set 2D network to operate in 3D images.
self.merge_2D_to_3D_preds = False
# feed +/- n neighbouring slices into channel dimension. set to None for no context.
self.n_3D_context = None
if self.n_3D_context is not None and self.dim == 2:
self.n_channels *= (self.n_3D_context * 2 + 1)
#########################
# Architecture #
#########################
self.start_filts = 48 if self.dim == 2 else 18
self.end_filts = self.start_filts * 4 if self.dim == 2 else self.start_filts * 2
self.res_architecture = 'resnet50' # 'resnet101' , 'resnet50'
self.norm = None # one of None, 'instance_norm', 'batch_norm'
self.weight_decay = 1e-4
# one of 'xavier_uniform', 'xavier_normal', or 'kaiming_normal', None (=default = 'kaiming_uniform')
self.weight_init = None
#########################
# Schedule / Selection #
#########################
self.num_epochs = 11
self.num_train_batches = 100 if self.dim == 2 else 200
self.batch_size = 20 if self.dim == 2 else 8
self.do_validation = True
# decide whether to validate on entire patient volumes (like testing) or sampled patches (like training)
# the former is morge accurate, while the latter is faster (depending on volume size)
self.val_mode = 'val_patient' # one of 'val_sampling' , 'val_patient'
if self.val_mode == 'val_patient':
self.max_val_patients = None # if 'None' iterates over entire val_set once.
if self.val_mode == 'val_sampling':
self.num_val_batches = 50
# set dynamic_lr_scheduling to True to apply LR scheduling with below settings.
self.dynamic_lr_scheduling = True
self.lr_decay_factor = 0.5
self.scheduling_patience = int(self.num_train_batches * self.batch_size / 2400)
self.scheduling_criterion = 'malignant_ap'
self.scheduling_mode = 'min' if "loss" in self.scheduling_criterion else 'max'
#########################
# Testing / Plotting #
#########################
# set the top-n-epochs to be saved for temporal averaging in testing.
self.save_n_models = 2
self.test_n_epochs = 2
# set a minimum epoch number for saving in case of instabilities in the first phase of training.
self.min_save_thresh = 0 if self.dim == 2 else 0
self.report_score_level = ['patient', 'rois'] # choose list from 'patient', 'rois'
self.class_dict = {1: 'benign', 2: 'malignant'} # 0 is background.
self.patient_class_of_interest = 2 # patient metrics are only plotted for one class.
self.ap_match_ious = [0.1] # list of ious to be evaluated for ap-scoring.
self.model_selection_criteria = ['benign_ap', 'malignant_ap'] # criteria to average over for saving epochs.
self.min_det_thresh = 0.1 # minimum confidence value to select predictions for evaluation.
# threshold for clustering predictions together (wcs = weighted cluster scoring).
# needs to be >= the expected overlap of predictions coming from one model (typically NMS threshold).
# if too high, preds of the same object are separate clusters.
self.wcs_iou = 1e-5
self.plot_prediction_histograms = True
self.plot_stat_curves = False
#########################
# Data Augmentation #
#########################
self.da_kwargs={
'do_elastic_deform': True,
'alpha':(0., 1500.),
'sigma':(30., 50.),
'do_rotation':True,
'angle_x': (0., 2 * np.pi),
'angle_y': (0., 0),
'angle_z': (0., 0),
'do_scale': True,
'scale':(0.8, 1.1),
'random_crop':False,
'rand_crop_dist': (self.patch_size[0] / 2. - 3, self.patch_size[1] / 2. - 3),
'border_mode_data': 'constant',
'border_cval_data': 0,
'order_data': 1
}
if self.dim == 3:
self.da_kwargs['do_elastic_deform'] = False
self.da_kwargs['angle_x'] = (0, 0.0)
self.da_kwargs['angle_y'] = (0, 0.0) #must be 0!!
self.da_kwargs['angle_z'] = (0., 2 * np.pi)
#########################
# Add model specifics #
#########################
{'detection_unet': self.add_det_unet_configs,
'mrcnn': self.add_mrcnn_configs,
'ufrcnn': self.add_mrcnn_configs,
'ufrcnn_surrounding': self.add_mrcnn_configs,
'retina_net': self.add_mrcnn_configs,
'retina_unet': self.add_mrcnn_configs,
'prob_detector': self.add_mrcnn_configs,
}[self.model]()
def add_det_unet_configs(self):
self.learning_rate = [1e-4] * self.num_epochs
# aggregation from pixel perdiction to object scores (connected component). One of ['max', 'median']
self.aggregation_operation = 'max'
# max number of roi candidates to identify per image (slice in 2D, volume in 3D)
self.n_roi_candidates = 3 if self.dim == 2 else 8
# loss mode: either weighted cross entropy ('wce'), batch-wise dice loss ('dice), or the sum of both ('dice_wce')
self.seg_loss_mode = 'dice_wce'
# if <1, false positive predictions in foreground are penalized less.
self.fp_dice_weight = 1 if self.dim == 2 else 1
self.wce_weights = [1, 1, 1]
self.detection_min_confidence = self.min_det_thresh
# if 'True', loss distinguishes all classes, else only foreground vs. background (class agnostic).
self.class_specific_seg_flag = True
self.num_seg_classes = 3 if self.class_specific_seg_flag else 2
self.head_classes = self.num_seg_classes
def add_mrcnn_configs(self):
# learning rate is a list with one entry per epoch.
self.learning_rate = [1e-4] * self.num_epochs
# disable mask head loss. (e.g. if no pixelwise annotations available)
self.frcnn_mode = False
# disable the re-sampling of mask proposals to original size for speed-up.
# since evaluation is detection-driven (box-matching) and not instance segmentation-driven (iou-matching),
# mask-outputs are optional.
self.return_masks_in_val = True
self.return_masks_in_test = False
# set number of proposal boxes to plot after each epoch.
self.n_plot_rpn_props = 0 if self.dim == 2 else 0
# number of classes for head networks: n_foreground_classes + 1 (background)
self.head_classes = 3
# seg_classes hier refers to the first stage classifier (RPN)
self.num_seg_classes = 2 # foreground vs. background
# feature map strides per pyramid level are inferred from architecture.
self.backbone_strides = {'xy': [4, 8, 16, 32], 'z': [1, 2, 4, 8]}
# anchor scales are chosen according to expected object sizes in data set. Default uses only one anchor scale
# per pyramid level. (outer list are pyramid levels (corresponding to BACKBONE_STRIDES), inner list are scales per level.)
self.rpn_anchor_scales = {'xy': [[8], [16], [32], [64]], 'z': [[2], [4], [8], [16]]}
# choose which pyramid levels to extract features from: P2: 0, P3: 1, P4: 2, P5: 3.
self.pyramid_levels = [0, 1, 2, 3]
# number of feature maps in rpn. typically lowered in 3D to save gpu-memory.
self.n_rpn_features = 512 if self.dim == 2 else 128
# anchor ratios and strides per position in feature maps.
self.rpn_anchor_ratios = [0.5, 1., 2.]
self.rpn_anchor_stride = 1
# Threshold for first stage (RPN) non-maximum suppression (NMS): LOWER == HARDER SELECTION
self.rpn_nms_threshold = 0.7 if self.dim == 2 else 0.7
# loss sampling settings.
self.rpn_train_anchors_per_image = 2 #per batch element
self.train_rois_per_image = 2 #per batch element
self.roi_positive_ratio = 0.5
self.anchor_matching_iou = 0.7
# factor of top-k candidates to draw from per negative sample (stochastic-hard-example-mining).
# poolsize to draw top-k candidates from will be shem_poolsize * n_negative_samples.
self.shem_poolsize = 10
self.pool_size = (7, 7) if self.dim == 2 else (7, 7, 3)
self.mask_pool_size = (14, 14) if self.dim == 2 else (14, 14, 5)
self.mask_shape = (28, 28) if self.dim == 2 else (28, 28, 10)
self.rpn_bbox_std_dev = np.array([0.1, 0.1, 0.1, 0.2, 0.2, 0.2])
self.bbox_std_dev = np.array([0.1, 0.1, 0.1, 0.2, 0.2, 0.2])
self.window = np.array([0, 0, self.patch_size[0], self.patch_size[1]])
self.scale = np.array([self.patch_size[0], self.patch_size[1], self.patch_size[0], self.patch_size[1]])
if self.dim == 2:
self.rpn_bbox_std_dev = self.rpn_bbox_std_dev[:4]
self.bbox_std_dev = self.bbox_std_dev[:4]
self.window = self.window[:4]
self.scale = self.scale[:4]
# pre-selection in proposal-layer (stage 1) for NMS-speedup. applied per batch element.
self.pre_nms_limit = 3000 if self.dim == 2 else 6000
# n_proposals to be selected after NMS per batch element. too high numbers blow up memory if "detect_while_training" is True,
# since proposals of the entire batch are forwarded through second stage in as one "batch".
self.roi_chunk_size = 800 if self.dim == 2 else 600
self.post_nms_rois_training = 500 if self.dim == 2 else 75
self.post_nms_rois_inference = 500
# Final selection of detections (refine_detections)
self.model_max_instances_per_batch_element = 10 if self.dim == 2 else 30 # per batch element and class.
self.detection_nms_threshold = 1e-5 # needs to be > 0, otherwise all predictions are one cluster.
self.model_min_confidence = 0.1
if self.dim == 2:
self.backbone_shapes = np.array(
[[int(np.ceil(self.patch_size[0] / stride)),
int(np.ceil(self.patch_size[1] / stride))]
for stride in self.backbone_strides['xy']])
else:
self.backbone_shapes = np.array(
[[int(np.ceil(self.patch_size[0] / stride)),
int(np.ceil(self.patch_size[1] / stride)),
int(np.ceil(self.patch_size[2] / stride_z))]
for stride, stride_z in zip(self.backbone_strides['xy'], self.backbone_strides['z']
)])
if self.model == 'ufrcnn':
self.operate_stride1 = True
self.class_specific_seg_flag = True
self.num_seg_classes = 3 if self.class_specific_seg_flag else 2
self.frcnn_mode = True
if self.model == 'retina_net' or self.model == 'retina_unet' or self.model == 'prob_detector':
# implement extra anchor-scales according to retina-net publication.
self.rpn_anchor_scales['xy'] = [[ii[0], ii[0] * (2 ** (1 / 3)), ii[0] * (2 ** (2 / 3))] for ii in
self.rpn_anchor_scales['xy']]
self.rpn_anchor_scales['z'] = [[ii[0], ii[0] * (2 ** (1 / 3)), ii[0] * (2 ** (2 / 3))] for ii in
self.rpn_anchor_scales['z']]
self.n_anchors_per_pos = len(self.rpn_anchor_ratios) * 3
self.n_rpn_features = 256 if self.dim == 2 else 64
# pre-selection of detections for NMS-speedup. per entire batch.
self.pre_nms_limit = 10000 if self.dim == 2 else 50000
# anchor matching iou is lower than in Mask R-CNN according to https://arxiv.org/abs/1708.02002
self.anchor_matching_iou = 0.5
# if 'True', seg loss distinguishes all classes, else only foreground vs. background (class agnostic).
self.num_seg_classes = 3 if self.class_specific_seg_flag else 2
if self.model == 'retina_unet':
self.operate_stride1 = True
diff --git a/experiments/toy_exp/data_loader.py b/experiments/toy_exp/data_loader.py
index bbf18cc..e1c84b4 100644
--- a/experiments/toy_exp/data_loader.py
+++ b/experiments/toy_exp/data_loader.py
@@ -1,311 +1,309 @@
#!/usr/bin/env python
# Copyright 2018 Division of Medical Image Computing, German Cancer Research Center (DKFZ).
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import numpy as np
import os
from collections import OrderedDict
import pandas as pd
import pickle
import time
import subprocess
import utils.dataloader_utils as dutils
# batch generator tools from https://github.com/MIC-DKFZ/batchgenerators
from batchgenerators.dataloading.data_loader import SlimDataLoaderBase
from batchgenerators.transforms.spatial_transforms import MirrorTransform as Mirror
from batchgenerators.transforms.abstract_transforms import Compose
from batchgenerators.dataloading.multi_threaded_augmenter import MultiThreadedAugmenter
from batchgenerators.dataloading import SingleThreadedAugmenter
from batchgenerators.transforms.spatial_transforms import SpatialTransform
from batchgenerators.transforms.crop_and_pad_transforms import CenterCropTransform
from batchgenerators.transforms.utility_transforms import ConvertSegToBoundingBoxCoordinates
def get_train_generators(cf, logger):
"""
wrapper function for creating the training batch generator pipeline. returns the train/val generators.
selects patients according to cv folds (generated by first run/fold of experiment):
splits the data into n-folds, where 1 split is used for val, 1 split for testing and the rest for training. (inner loop test set)
If cf.hold_out_test_set is True, adds the test split to the training data.
"""
all_data = load_dataset(cf, logger)
all_pids_list = np.unique([v['pid'] for (k, v) in all_data.items()])
train_pids = all_pids_list[:cf.n_train_data]
val_pids = all_pids_list[1000:1500]
train_data = {k: v for (k, v) in all_data.items() if any(p == v['pid'] for p in train_pids)}
val_data = {k: v for (k, v) in all_data.items() if any(p == v['pid'] for p in val_pids)}
logger.info("data set loaded with: {} train / {} val patients".format(len(train_pids), len(val_pids)))
batch_gen = {}
batch_gen['train'] = create_data_gen_pipeline(train_data, cf=cf, do_aug=False)
batch_gen['val_sampling'] = create_data_gen_pipeline(val_data, cf=cf, do_aug=False)
if cf.val_mode == 'val_patient':
batch_gen['val_patient'] = PatientBatchIterator(val_data, cf=cf)
batch_gen['n_val'] = len(val_pids) if cf.max_val_patients is None else min(len(val_pids), cf.max_val_patients)
else:
batch_gen['n_val'] = cf.num_val_batches
return batch_gen
def get_test_generator(cf, logger):
"""
wrapper function for creating the test batch generator pipeline.
selects patients according to cv folds (generated by first run/fold of experiment)
If cf.hold_out_test_set is True, gets the data from an external folder instead.
"""
if cf.hold_out_test_set:
pp_name = cf.pp_test_name
test_ix = None
else:
pp_name = None
with open(os.path.join(cf.exp_dir, 'fold_ids.pickle'), 'rb') as handle:
fold_list = pickle.load(handle)
_, _, test_ix, _ = fold_list[cf.fold]
# warnings.warn('WARNING: using validation set for testing!!!')
test_data = load_dataset(cf, logger, test_ix, pp_data_path=cf.pp_test_data_path, pp_name=pp_name)
logger.info("data set loaded with: {} test patients from {}".format(len(test_data.keys()), cf.pp_test_data_path))
batch_gen = {}
batch_gen['test'] = PatientBatchIterator(test_data, cf=cf)
batch_gen['n_test'] = len(test_data.keys()) if cf.max_test_patients=="all" else \
min(cf.max_test_patients, len(test_data.keys()))
return batch_gen
def load_dataset(cf, logger, subset_ixs=None, pp_data_path=None, pp_name=None):
"""
loads the dataset. if deployed in cloud also copies and unpacks the data to the working directory.
:param subset_ixs: subset indices to be loaded from the dataset. used e.g. for testing to only load the test folds.
:return: data: dictionary with one entry per patient (in this case per patient-breast, since they are treated as
individual images for training) each entry is a dictionary containing respective meta-info as well as paths to the preprocessed
numpy arrays to be loaded during batch-generation
"""
if pp_data_path is None:
pp_data_path = cf.pp_data_path
if pp_name is None:
pp_name = cf.pp_name
if cf.server_env:
copy_data = True
- target_dir = os.path.join('/ssd', cf.slurm_job_id, pp_name)
+ target_dir = os.path.join(cf.data_dest, pp_name)
if not os.path.exists(target_dir):
cf.data_source_dir = pp_data_path
os.makedirs(target_dir)
subprocess.call('rsync -av {} {}'.format(
os.path.join(cf.data_source_dir, cf.input_df_name), os.path.join(target_dir, cf.input_df_name)), shell=True)
logger.info('created target dir and info df at {}'.format(os.path.join(target_dir, cf.input_df_name)))
elif subset_ixs is None:
copy_data = False
pp_data_path = target_dir
p_df = pd.read_pickle(os.path.join(pp_data_path, cf.input_df_name))
if subset_ixs is not None:
subset_pids = [np.unique(p_df.pid.tolist())[ix] for ix in subset_ixs]
p_df = p_df[p_df.pid.isin(subset_pids)]
logger.info('subset: selected {} instances from df'.format(len(p_df)))
if cf.server_env:
if copy_data:
copy_and_unpack_data(logger, p_df.pid.tolist(), cf.fold_dir, cf.data_source_dir, target_dir)
class_targets = p_df['class_id'].tolist()
pids = p_df.pid.tolist()
imgs = [os.path.join(pp_data_path, '{}.npy'.format(pid)) for pid in pids]
segs = [os.path.join(pp_data_path,'{}.npy'.format(pid)) for pid in pids]
data = OrderedDict()
for ix, pid in enumerate(pids):
data[pid] = {'data': imgs[ix], 'seg': segs[ix], 'pid': pid, 'class_target': [class_targets[ix]]}
return data
def create_data_gen_pipeline(patient_data, cf, do_aug=True):
"""
create mutli-threaded train/val/test batch generation and augmentation pipeline.
:param patient_data: dictionary containing one dictionary per patient in the train/test subset.
:param is_training: (optional) whether to perform data augmentation (training) or not (validation/testing)
:return: multithreaded_generator
"""
# create instance of batch generator as first element in pipeline.
data_gen = BatchGenerator(patient_data, batch_size=cf.batch_size, cf=cf)
# add transformations to pipeline.
my_transforms = []
if do_aug:
mirror_transform = Mirror(axes=np.arange(2, cf.dim+2, 1))
my_transforms.append(mirror_transform)
spatial_transform = SpatialTransform(patch_size=cf.patch_size[:cf.dim],
patch_center_dist_from_border=cf.da_kwargs['rand_crop_dist'],
do_elastic_deform=cf.da_kwargs['do_elastic_deform'],
alpha=cf.da_kwargs['alpha'], sigma=cf.da_kwargs['sigma'],
do_rotation=cf.da_kwargs['do_rotation'], angle_x=cf.da_kwargs['angle_x'],
angle_y=cf.da_kwargs['angle_y'], angle_z=cf.da_kwargs['angle_z'],
do_scale=cf.da_kwargs['do_scale'], scale=cf.da_kwargs['scale'],
random_crop=cf.da_kwargs['random_crop'])
my_transforms.append(spatial_transform)
else:
my_transforms.append(CenterCropTransform(crop_size=cf.patch_size[:cf.dim]))
my_transforms.append(ConvertSegToBoundingBoxCoordinates(cf.dim, get_rois_from_seg_flag=False, class_specific_seg_flag=cf.class_specific_seg_flag))
all_transforms = Compose(my_transforms)
# multithreaded_generator = SingleThreadedAugmenter(data_gen, all_transforms)
multithreaded_generator = MultiThreadedAugmenter(data_gen, all_transforms, num_processes=cf.n_workers, seeds=range(cf.n_workers))
return multithreaded_generator
class BatchGenerator(SlimDataLoaderBase):
"""
creates the training/validation batch generator. Samples n_batch_size patients (draws a slice from each patient if 2D)
from the data set while maintaining foreground-class balance. Returned patches are cropped/padded to pre_crop_size.
Actual patch_size is obtained after data augmentation.
:param data: data dictionary as provided by 'load_dataset'.
:param batch_size: number of patients to sample for the batch
:return dictionary containing the batch data (b, c, x, y, (z)) / seg (b, 1, x, y, (z)) / pids / class_target
"""
def __init__(self, data, batch_size, cf):
super(BatchGenerator, self).__init__(data, batch_size)
self.cf = cf
def generate_train_batch(self):
batch_data, batch_segs, batch_pids, batch_targets = [], [], [], []
class_targets_list = [v['class_target'] for (k, v) in self._data.items()]
#samples patients towards equilibrium of foreground classes on a roi-level (after randomly sampling the ratio "batch_sample_slack).
batch_ixs = dutils.get_class_balanced_patients(
class_targets_list, self.batch_size, self.cf.head_classes - 1, slack_factor=self.cf.batch_sample_slack)
patients = list(self._data.items())
for b in batch_ixs:
patient = patients[b][1]
all_data = np.load(patient['data'], mmap_mode='r')
data = all_data[0]
seg = all_data[1].astype('uint8')
batch_pids.append(patient['pid'])
batch_targets.append(patient['class_target'])
batch_data.append(data[np.newaxis])
batch_segs.append(seg[np.newaxis])
data = np.array(batch_data)
seg = np.array(batch_segs).astype(np.uint8)
class_target = np.array(batch_targets)
return {'data': data, 'seg': seg, 'pid': batch_pids, 'class_target': class_target}
class PatientBatchIterator(SlimDataLoaderBase):
"""
creates a test generator that iterates over entire given dataset returning 1 patient per batch.
Can be used for monitoring if cf.val_mode = 'patient_val' for a monitoring closer to actualy evaluation (done in 3D),
if willing to accept speed-loss during training.
:return: out_batch: dictionary containing one patient with batch_size = n_3D_patches in 3D or
batch_size = n_2D_patches in 2D .
"""
def __init__(self, data, cf): #threads in augmenter
super(PatientBatchIterator, self).__init__(data, 0)
self.cf = cf
self.patient_ix = 0
self.dataset_pids = [v['pid'] for (k, v) in data.items()]
self.patch_size = cf.patch_size
if len(self.patch_size) == 2:
self.patch_size = self.patch_size + [1]
def generate_train_batch(self):
pid = self.dataset_pids[self.patient_ix]
patient = self._data[pid]
all_data = np.load(patient['data'], mmap_mode='r')
data = all_data[0]
seg = all_data[1].astype('uint8')
batch_class_targets = np.array([patient['class_target']])
out_data = data[None, None]
out_seg = seg[None, None]
print('check patient data loader', out_data.shape, out_seg.shape)
batch_2D = {'data': out_data, 'seg': out_seg, 'class_target': batch_class_targets, 'pid': pid}
converter = ConvertSegToBoundingBoxCoordinates(dim=2, get_rois_from_seg_flag=False, class_specific_seg_flag=self.cf.class_specific_seg_flag)
batch_2D = converter(**batch_2D)
batch_2D.update({'patient_bb_target': batch_2D['bb_target'],
'patient_roi_labels': batch_2D['roi_labels'],
'original_img_shape': out_data.shape})
self.patient_ix += 1
if self.patient_ix == len(self.dataset_pids):
self.patient_ix = 0
return batch_2D
-
-
def copy_and_unpack_data(logger, pids, fold_dir, source_dir, target_dir):
start_time = time.time()
with open(os.path.join(fold_dir, 'file_list.txt'), 'w') as handle:
for pid in pids:
handle.write('{}.npy\n'.format(pid))
- subprocess.call('rsync -av --files-from {} {} {}'.format(os.path.join(fold_dir, 'file_list.txt'),
+ subprocess.call('rsync -ahv --files-from {} {} {}'.format(os.path.join(fold_dir, 'file_list.txt'),
source_dir, target_dir), shell=True)
# dutils.unpack_dataset(target_dir)
copied_files = os.listdir(target_dir)
- logger.info("copying and unpacking data set finsihed : {} files in target dir: {}. took {} sec".format(
+ logger.info("copying and unpacking data set finished : {} files in target dir: {}. took {} sec".format(
len(copied_files), target_dir, np.round(time.time() - start_time, 0)))
if __name__=="__main__":
import utils.exp_utils as utils
total_stime = time.time()
cf_file = utils.import_module("cf", "configs.py")
cf = cf_file.configs()
logger = utils.get_logger("dev")
batch_gen = get_train_generators(cf, logger)
train_batch = next(batch_gen["train"])
pids = []
total = 100
for i in range(total):
print("\r producing batch {}/{}.".format(i, total), end="", flush=True)
train_batch = next(batch_gen["train"])
pids.append(train_batch["pid"])
print()
mins, secs = divmod((time.time() - total_stime), 60)
h, mins = divmod(mins, 60)
t = "{:d}h:{:02d}m:{:02d}s".format(int(h), int(mins), int(secs))
print("{} total runtime: {}".format(os.path.split(__file__)[1], t))
\ No newline at end of file
diff --git a/utils/exp_utils.py b/utils/exp_utils.py
index 58c8c3e..3aab283 100644
--- a/utils/exp_utils.py
+++ b/utils/exp_utils.py
@@ -1,418 +1,419 @@
#!/usr/bin/env python
# Copyright 2018 Division of Medical Image Computing, German Cancer Research Center (DKFZ).
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
import sys
import subprocess
import os
import plotting
import importlib.util
import pickle
import logging
from torch.utils.tensorboard import SummaryWriter
from collections import OrderedDict
import numpy as np
import torch
import pandas as pd
class CombinedLogger(object):
"""Combine console and tensorboard logger and record system metrics.
"""
def __init__(self, name, log_dir, server_env=True, fold="all"):
self.pylogger = logging.getLogger(name)
self.tboard = SummaryWriter(log_dir=os.path.join(log_dir, "tboard"))
self.log_dir = log_dir
self.fold = str(fold)
self.server_env = server_env
self.pylogger.setLevel(logging.DEBUG)
self.log_file = os.path.join(log_dir, "fold_"+self.fold, 'exec.log')
os.makedirs(os.path.dirname(self.log_file), exist_ok=True)
self.pylogger.addHandler(logging.FileHandler(self.log_file))
if not server_env:
self.pylogger.addHandler(ColorHandler())
else:
self.pylogger.addHandler(logging.StreamHandler())
self.pylogger.propagate = False
def __getattr__(self, attr):
"""delegate all undefined method requests to objects of
this class in order pylogger, tboard (first find first serve).
E.g., combinedlogger.add_scalars(...) should trigger self.tboard.add_scalars(...)
"""
for obj in [self.pylogger, self.tboard]:
if attr in dir(obj):
return getattr(obj, attr)
print("logger attr not found")
def set_logfile(self, fold=None, log_file=None):
if fold is not None:
self.fold = str(fold)
if log_file is None:
self.log_file = os.path.join(self.log_dir, "fold_"+self.fold, 'exec.log')
else:
self.log_file = log_file
os.makedirs(os.path.dirname(self.log_file), exist_ok=True)
for hdlr in self.pylogger.handlers:
hdlr.close()
self.pylogger.handlers = []
self.pylogger.addHandler(logging.FileHandler(self.log_file))
if not self.server_env:
self.pylogger.addHandler(ColorHandler())
else:
self.pylogger.addHandler(logging.StreamHandler())
def metrics2tboard(self, metrics, global_step=None, suptitle=None):
"""
:param metrics: {'train': dataframe, 'val':df}, df as produced in
evaluator.py.evaluate_predictions
"""
# print("metrics", metrics)
if global_step is None:
global_step = len(metrics['train'][list(metrics['train'].keys())[0]]) - 1
if suptitle is not None:
suptitle = str(suptitle)
else:
suptitle = "Fold_" + str(self.fold)
for key in ['train', 'val']:
# series = {k:np.array(v[-1]) for (k,v) in metrics[key].items() if not np.isnan(v[-1]) and not 'Bin_Stats' in k}
loss_series = {}
unc_series = {}
bin_stat_series = {}
mon_met_series = {}
for tag, val in metrics[key].items():
val = val[-1] # maybe remove list wrapping, recording in evaluator?
if 'loss' in tag.lower() and not np.isnan(val):
loss_series["{}".format(tag)] = val
elif not np.isnan(val):
mon_met_series["{}".format(tag)] = val
self.tboard.add_scalars(suptitle + "/Losses/{}".format(key), loss_series, global_step)
self.tboard.add_scalars(suptitle + "/Monitor_Metrics/{}".format(key), mon_met_series, global_step)
self.tboard.add_scalars(suptitle + "/Learning_Rate", metrics["lr"], global_step)
return
def __del__(self): # otherwise might produce multiple prints e.g. in ipython console
for hdlr in self.pylogger.handlers:
hdlr.close()
self.pylogger.handlers = []
del self.pylogger
self.tboard.flush()
# close somehow prevents main script from exiting
# maybe revise this issue in a later pytorch version
#self.tboard.close()
def get_logger(exp_dir, server_env=False):
"""
creates logger instance. writing out info to file, to terminal and to tensorboard.
:param exp_dir: experiment directory, where exec.log file is stored.
:param server_env: True if operating in server environment (e.g., gpu cluster)
:return: custom CombinedLogger instance.
"""
log_dir = os.path.join(exp_dir, "logs")
logger = CombinedLogger('medicaldetectiontoolkit', log_dir, server_env=server_env)
print("Logging to {}".format(logger.log_file))
return logger
def prep_exp(dataset_path, exp_path, server_env, use_stored_settings=True, is_training=True):
"""
I/O handling, creating of experiment folder structure. Also creates a snapshot of configs/model scripts and copies them to the exp_dir.
This way the exp_dir contains all info needed to conduct an experiment, independent to changes in actual source code. Thus, training/inference of this experiment can be started at anytime. Therefore, the model script is copied back to the source code dir as tmp_model (tmp_backbone).
Provides robust structure for cloud deployment.
:param dataset_path: path to source code for specific data set. (e.g. medicaldetectiontoolkit/lidc_exp)
:param exp_path: path to experiment directory.
:param server_env: boolean flag. pass to configs script for cloud deployment.
:param use_stored_settings: boolean flag. When starting training: If True, starts training from snapshot in existing experiment directory, else creates experiment directory on the fly using configs/model scripts from source code.
:param is_training: boolean flag. distinguishes train vs. inference mode.
:return:
"""
if is_training:
if use_stored_settings:
cf_file = import_module('cf_file', os.path.join(exp_path, 'configs.py'))
cf = cf_file.configs(server_env)
# in this mode, previously saved model and backbone need to be found in exp dir.
if not os.path.isfile(os.path.join(exp_path, 'model.py')) or \
not os.path.isfile(os.path.join(exp_path, 'backbone.py')):
raise Exception(
"Selected use_stored_settings option but no model and/or backbone source files exist in exp dir.")
cf.model_path = os.path.join(exp_path, 'model.py')
cf.backbone_path = os.path.join(exp_path, 'backbone.py')
else:
# this case overwrites settings files in exp dir, i.e., default_configs, configs, backbone, model
+ print("exp path", exp_path)
os.makedirs(exp_path, exist_ok=True)
# run training with source code info and copy snapshot of model to exp_dir for later testing (overwrite scripts if exp_dir already exists.)
subprocess.call('cp {} {}'.format('default_configs.py', os.path.join(exp_path, 'default_configs.py')),
shell=True)
subprocess.call(
'cp {} {}'.format(os.path.join(dataset_path, 'configs.py'), os.path.join(exp_path, 'configs.py')),
shell=True)
cf_file = import_module('cf_file', os.path.join(dataset_path, 'configs.py'))
cf = cf_file.configs(server_env)
subprocess.call('cp {} {}'.format(cf.model_path, os.path.join(exp_path, 'model.py')), shell=True)
subprocess.call('cp {} {}'.format(cf.backbone_path, os.path.join(exp_path, 'backbone.py')), shell=True)
if os.path.isfile(os.path.join(exp_path, "fold_ids.pickle")):
subprocess.call('rm {}'.format(os.path.join(exp_path, "fold_ids.pickle")), shell=True)
else:
# testing, use model and backbone stored in exp dir.
cf_file = import_module('cf_file', os.path.join(exp_path, 'configs.py'))
cf = cf_file.configs(server_env)
cf.model_path = os.path.join(exp_path, 'model.py')
cf.backbone_path = os.path.join(exp_path, 'backbone.py')
cf.exp_dir = exp_path
cf.test_dir = os.path.join(cf.exp_dir, 'test')
cf.plot_dir = os.path.join(cf.exp_dir, 'plots')
if not os.path.exists(cf.test_dir):
os.mkdir(cf.test_dir)
if not os.path.exists(cf.plot_dir):
os.mkdir(cf.plot_dir)
cf.experiment_name = exp_path.split("/")[-1]
cf.server_env = server_env
cf.created_fold_id_pickle = False
return cf
def import_module(name, path):
"""
correct way of importing a module dynamically in python 3.
:param name: name given to module instance.
:param path: path to module.
:return: module: returned module instance.
"""
spec = importlib.util.spec_from_file_location(name, path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
class ModelSelector:
'''
saves a checkpoint after each epoch as 'last_state' (can be loaded to continue interrupted training).
saves the top-k (k=cf.save_n_models) ranked epochs. In inference, predictions of multiple epochs can be ensembled to improve performance.
'''
def __init__(self, cf, logger):
self.cf = cf
self.saved_epochs = [-1] * cf.save_n_models
self.logger = logger
def run_model_selection(self, net, optimizer, monitor_metrics, epoch):
# take the mean over all selection criteria in each epoch
non_nan_scores = np.mean(np.array([[0 if (ii is None or np.isnan(ii)) else ii for ii in monitor_metrics['val'][sc]] for sc in self.cf.model_selection_criteria]), 0)
epochs_scores = [ii for ii in non_nan_scores[1:]]
# ranking of epochs according to model_selection_criterion
epoch_ranking = np.argsort(epochs_scores, kind="stable")[::-1] + 1 #epochs start at 1
# if set in configs, epochs < min_save_thresh are discarded from saving process.
epoch_ranking = epoch_ranking[epoch_ranking >= self.cf.min_save_thresh]
# check if current epoch is among the top-k epochs.
if epoch in epoch_ranking[:self.cf.save_n_models]:
save_dir = os.path.join(self.cf.fold_dir, '{}_best_checkpoint'.format(epoch))
if not os.path.exists(save_dir):
os.mkdir(save_dir)
torch.save(net.state_dict(), os.path.join(save_dir, 'params.pth'))
with open(os.path.join(save_dir, 'monitor_metrics.pickle'), 'wb') as handle:
pickle.dump(monitor_metrics, handle)
# save epoch_ranking to keep info for inference.
np.save(os.path.join(self.cf.fold_dir, 'epoch_ranking'), epoch_ranking[:self.cf.save_n_models])
np.save(os.path.join(save_dir, 'epoch_ranking'), epoch_ranking[:self.cf.save_n_models])
self.logger.info(
"saving current epoch {} at rank {}".format(epoch, np.argwhere(epoch_ranking == epoch)))
# delete params of the epoch that just fell out of the top-k epochs.
for se in [int(ii.split('_')[0]) for ii in os.listdir(self.cf.fold_dir) if 'best_checkpoint' in ii]:
if se in epoch_ranking[self.cf.save_n_models:]:
subprocess.call('rm -rf {}'.format(os.path.join(self.cf.fold_dir, '{}_best_checkpoint'.format(se))), shell=True)
self.logger.info('deleting epoch {} at rank {}'.format(se, np.argwhere(epoch_ranking == se)))
state = {
'epoch': epoch,
'state_dict': net.state_dict(),
'optimizer': optimizer.state_dict(),
}
# save checkpoint of current epoch.
save_dir = os.path.join(self.cf.fold_dir, 'last_checkpoint'.format(epoch))
if not os.path.exists(save_dir):
os.mkdir(save_dir)
torch.save(state, os.path.join(save_dir, 'params.pth'))
np.save(os.path.join(save_dir, 'epoch_ranking'), epoch_ranking[:self.cf.save_n_models])
with open(os.path.join(save_dir, 'monitor_metrics.pickle'), 'wb') as handle:
pickle.dump(monitor_metrics, handle)
def load_checkpoint(checkpoint_path, net, optimizer):
checkpoint_params = torch.load(os.path.join(checkpoint_path, 'params.pth'))
net.load_state_dict(checkpoint_params['state_dict'])
optimizer.load_state_dict(checkpoint_params['optimizer'])
with open(os.path.join(checkpoint_path, 'monitor_metrics.pickle'), 'rb') as handle:
monitor_metrics = pickle.load(handle)
starting_epoch = checkpoint_params['epoch'] + 1
return starting_epoch, monitor_metrics
def prepare_monitoring(cf):
"""
creates dictionaries, where train/val metrics are stored.
"""
metrics = {}
# first entry for loss dict accounts for epoch starting at 1.
metrics['train'] = OrderedDict()
metrics['val'] = OrderedDict()
metric_classes = []
if 'rois' in cf.report_score_level:
metric_classes.extend([v for k, v in cf.class_dict.items()])
if 'patient' in cf.report_score_level:
metric_classes.extend(['patient'])
for cl in metric_classes:
metrics['train'][cl + '_ap'] = [np.nan]
metrics['val'][cl + '_ap'] = [np.nan]
if cl == 'patient':
metrics['train'][cl + '_auc'] = [np.nan]
metrics['val'][cl + '_auc'] = [np.nan]
return metrics
def create_csv_output(results_list, cf, logger):
"""
Write out test set predictions to .csv file. output format is one line per prediction:
PatientID | PredictionID | [y1 x1 y2 x2 (z1) (z2)] | score | pred_classID
Note, that prediction coordinates correspond to images as loaded for training/testing and need to be adapted when
plotted over raw data (before preprocessing/resampling).
:param results_list: [[patient_results, patient_id], [patient_results, patient_id], ...]
"""
logger.info('creating csv output file at {}'.format(os.path.join(cf.exp_dir, 'results.csv')))
predictions_df = pd.DataFrame(columns = ['patientID', 'predictionID', 'coords', 'score', 'pred_classID'])
for r in results_list:
pid = r[1]
#optionally load resampling info from preprocessing to match output predictions with raw data.
#with open(os.path.join(cf.exp_dir, 'test_resampling_info', pid), 'rb') as handle:
# resampling_info = pickle.load(handle)
for bix, box in enumerate(r[0][0]):
assert box['box_type'] == 'det', box['box_type']
coords = box['box_coords']
score = box['box_score']
pred_class_id = box['box_pred_class_id']
out_coords = []
if score >= cf.min_det_thresh:
out_coords.append(coords[0]) #* resampling_info['scale'][0])
out_coords.append(coords[1]) #* resampling_info['scale'][1])
out_coords.append(coords[2]) #* resampling_info['scale'][0])
out_coords.append(coords[3]) #* resampling_info['scale'][1])
if len(coords) > 4:
out_coords.append(coords[4]) #* resampling_info['scale'][2] + resampling_info['z_crop'])
out_coords.append(coords[5]) #* resampling_info['scale'][2] + resampling_info['z_crop'])
predictions_df.loc[len(predictions_df)] = [pid, bix, out_coords, score, pred_class_id]
try:
fold = cf.fold
except:
fold = 'hold_out'
predictions_df.to_csv(os.path.join(cf.exp_dir, 'results_{}.csv'.format(fold)), index=False)
class _AnsiColorizer(object):
"""
A colorizer is an object that loosely wraps around a stream, allowing
callers to write text to the stream in a particular color.
Colorizer classes must implement C{supported()} and C{write(text, color)}.
"""
_colors = dict(black=30, red=31, green=32, yellow=33,
blue=34, magenta=35, cyan=36, white=37, default=39)
def __init__(self, stream):
self.stream = stream
@classmethod
def supported(cls, stream=sys.stdout):
"""
A class method that returns True if the current platform supports
coloring terminal output using this method. Returns False otherwise.
"""
if not stream.isatty():
return False # auto color only on TTYs
try:
import curses
except ImportError:
return False
else:
try:
try:
return curses.tigetnum("colors") > 2
except curses.error:
curses.setupterm()
return curses.tigetnum("colors") > 2
except:
raise
# guess false in case of error
return False
def write(self, text, color):
"""
Write the given text to the stream in the given color.
@param text: Text to be written to the stream.
@param color: A string label for a color. e.g. 'red', 'white'.
"""
color = self._colors[color]
self.stream.write('\x1b[%sm%s\x1b[0m' % (color, text))
class ColorHandler(logging.StreamHandler):
def __init__(self, stream=sys.stdout):
super(ColorHandler, self).__init__(_AnsiColorizer(stream))
def emit(self, record):
msg_colors = {
logging.DEBUG: "green",
logging.INFO: "default",
logging.WARNING: "red",
logging.ERROR: "red"
}
color = msg_colors.get(record.levelno, "blue")
self.stream.write(record.msg + "\n", color)