diff --git a/README.md b/README.md
index cc48cbd..fca0fdf 100644
--- a/README.md
+++ b/README.md
@@ -1,126 +1,136 @@
 
 
 
 [<img src="https://img.shields.io/badge/chat-slack%20channel-75BBC4.svg">](https://join.slack.com/t/mdtoolkit/shared_invite/enQtNTQ3MjY2MzE0MDg2LWNjY2I2Njc5MTY0NmM0ZWIxNmQwZDRhYzk2MDdhM2QxYjliYTcwYzhkNTAxYmRkMDA0MjcyNDMyYjllNTZhY2M)
 <p align="center"><img src="assets/mdt_logo_2.png"  width=450></p><br>
 
 Copyright © German Cancer Research Center (DKFZ), <a href="https://www.dkfz.de/en/mic/index.php">Division of Medical Image Computing (MIC)</a>. Please make sure that your usage of this code is in compliance with the code <a href="https://github.com/pfjaeger/medicaldetectiontoolkit/blob/master/LICENSE">license</a>.  
 
+##Release Notes
+**v0.1.0**: Updates to python 3.7, torch 1.4.0, torchvision 0.5.0, entailing a change in custom extensions NMS and RoIAlign 
+        (now in C++ and CUDA). Scalar monitoring is changed to torch-included tensorboard. Added qualitative example 
+        plots for validation and testing. Default optimizer is changed to AdamW instead of Adam to account for
+        fix in weight-decay handling, norms and biases can optionally be excluded from weight decay. Introduced 
+        optional dynamic learning-rate scheduling. A specific CUDA device can be selected via script argument.\
+**v0.0.2**: Small fixes mainly regarding server-env settings (cluster deployment).\
+**v0.0.1**: Original framework as used for the corresponding paper, with Python 3.6 and torch 0.4.1 dependencies, 
+        custom extensions NMS and RoIAlign in C and CUDA, scalar monitoring via plot files.
+        
 ## Overview
 This is a comprehensive framework for object detection featuring:
 - 2D + 3D implementations of prevalent object detectors: e.g. Mask R-CNN [1], Retina Net [2], Retina U-Net [3]. 
 - Modular and light-weight structure ensuring sharing of all processing steps (incl. backbone architecture) for comparability of models.
 - training with bounding box and/or pixel-wise annotations.
 - dynamic patching and tiling of 2D + 3D images (for training and inference).
 - weighted consolidation of box predictions across patch-overlaps, ensembles, and dimensions [3].
 - monitoring + evaluation simultaneously on object and patient level. 
 - 2D + 3D output visualizations.
 - integration of COCO mean average precision metric [5]. 
 - integration of MIC-DKFZ batch generators for extensive data augmentation [6].
 - easy modification to evaluation of instance segmentation and/or semantic segmentation.
 <br/>
 [1] He, Kaiming, et al.  <a href="https://arxiv.org/abs/1703.06870">"Mask R-CNN"</a> ICCV, 2017<br>
 [2] Lin, Tsung-Yi, et al.  <a href="https://arxiv.org/abs/1708.02002">"Focal Loss for Dense Object Detection"</a> TPAMI, 2018.<br>
 [3] Jaeger, Paul et al. <a href="http://arxiv.org/abs/1811.08661"> "Retina U-Net: Embarrassingly Simple Exploitation
 of Segmentation Supervision for Medical Object Detection" </a>, 2018
 
 [5] https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocotools/cocoeval.py<br/>
 [6] https://github.com/MIC-DKFZ/batchgenerators<br/><br>
 
 ## How to cite this code
 Please cite the original publication [3].
 
 ## Installation
 Setup package in virtual environment
 ```
 git clone https://github.com/MIC-DKFZ/medicaldetectiontoolkit.git.
 cd medicaldetectiontoolkit
 virtualenv -p python3.7 mdt
 source mdt/bin/activate
 python setup.py install
 ```
 ##### Custom Extensions
 This framework uses two custom mixed C++/CUDA extensions: Non-maximum suppression (NMS) and RoIAlign. Both are adapted from the original pytorch extensions (under torchvision.ops.boxes and ops.roialign).
 The extensions are automatically compiled from the provided source files under medicaldetectiontoolkit/custom_extensions with above setup.py. 
 However, the extensions need to be compiled specifically for certain GPU architectures. Hence, please ensure that the architectures you need are included in your shell's
 environment variable ```TORCH_CUDA_ARCH_LIST``` before compilation. 
 
 Example: You want to use the modules with the new TITAN RTX GPU, which has 
 Compute Capability 7.5 (Turing Architecture), but sometimes you also want to use it with a TITAN Xp (6.1, Pascal). Before installation you need to
 ```export TORCH_CUDA_ARCH_LIST="6.1;7.5"```. A link list of GPU model names to Compute Capability can be found here: https://developer.nvidia.com/cuda-gpus. 
 Note: If you'd like to import the raw extensions (not the wrapper modules), be sure to import torch first.
 
 
 ## Prepare the Data
 This framework is meant for you to be able to train models on your own data sets. 
 Two example data loaders are provided in medicaldetectiontoolkit/experiments including thorough documentation to ensure a quick start for your own project. The way I load Data is to have a preprocessing script, which after preprocessing saves the Data of whatever data type into numpy arrays (this is just run once). During training / testing, the data loader then loads these numpy arrays dynamically. (Please note the Data Input side is meant to be customized by you according to your own needs and the provided Data loaders are merely examples: LIDC has a powerful Dataloader that handles 2D/3D inputs and is optimized for patch-based training and inference. Toy-Experiments have a lightweight Dataloader, only handling 2D without patching. The latter makes sense if you want to get familiar with the framework.).
 
 ## Execute
 1. Set I/O paths, model and training specifics in the configs file: medicaldetectiontoolkit/experiments/your_experiment/configs.py
 2. Train the model: 
 
     ```
     python exec.py --mode train --exp_source experiments/my_experiment --exp_dir path/to/experiment/directory       
     ``` 
     This copies snapshots of configs and model to the specified exp_dir, where all outputs will be saved. By default, the data is split into 60% training and 20% validation and 20% testing data to perform a 5-fold cross validation (can be changed to hold-out test set in configs) and all folds will be trained iteratively. In order to train a single fold, specify it using the folds arg: 
     ```
     python exec.py --folds 0 1 2 .... # specify any combination of folds [0-4]
     ```
 3. Run inference:
     ```
     python exec.py --mode test --exp_dir path/to/experiment/directory 
     ```
     This runs the prediction pipeline and saves all results to exp_dir.
     
     
 ## Models
 
 This framework features all models explored in [3] (implemented in 2D + 3D): The proposed Retina U-Net, a simple but effective Architecture fusing state-of-the-art semantic segmentation with object detection,<br><br>
 <p align="center"><img src="assets/retu_figure.png"  width=50%></p><br>
 also implementations of prevalent object detectors, such as Mask R-CNN, Faster R-CNN+ (Faster R-CNN w\ RoIAlign), Retina Net, U-Faster R-CNN+ (the two stage counterpart of Retina U-Net: Faster R-CNN with auxiliary semantic segmentation), DetU-Net (a U-Net like segmentation architecture with heuristics for object detection.)<br><br><br>
 <p align="center"><img src="assets/baseline_figure.png"  width=85%></p><br>
 
 ## Training annotations
 This framework features training with pixelwise and/or bounding box annotations. To overcome the issue of box coordinates in 
 data augmentation, we feed the annotation masks through data augmentation (create a pseudo mask, if only bounding box annotations provided) and draw the boxes afterwards.<br><br>
 <p align="center"><img src="assets/annotations.png"  width=85%></p><br>
 
 
 The framework further handles two types of pixel-wise annotations: 
 
 1. A label map with individual ROIs identified by increasing label values, accompanied by a vector containing in each position the class target for the lesion with the corresponding label (for this mode set get_rois_from_seg_flag = False when calling ConvertSegToBoundingBoxCoordinates in your Data Loader).
 2. A binary label map. There is only one foreground class and single lesions are not identified. All lesions have the same class target (foreground). In this case the Dataloader runs a Connected Component Labelling algorithm to create processable lesion - class target pairs on the fly (for this mode set get_rois_from_seg_flag = True when calling ConvertSegToBoundingBoxCoordinates in your Data Loader). 
 
 ## Prediction pipeline
 This framework provides an inference module, which automatically handles patching of inputs, and tiling, ensembling, and weighted consolidation of output predictions:<br><br><br>
 <img src="assets/prediction_pipeline.png" ><br><br>
 
 
 ## Consolidation of predictions (Weighted Box Clustering)
 Multiple predictions of the same image (from  test time augmentations, tested epochs and overlapping patches), result in a high amount of boxes (or cubes), which need to be consolidated. In semantic segmentation, the final output would typically be obtained by averaging every pixel over all predictions. As described in [3], **weighted box clustering** (WBC) does this for box predictions:<br>
 <p align="center"><img src="assets/wcs_text.png"  width=650><br><br></p>
 <p align="center"><img src="assets/wcs_readme.png"  width=800><br><br></p>
 
 
 
 ## Visualization / Monitoring
 By default, loss functions and performance metrics are monitored:<br><br><br>
 <img src="assets/loss_monitoring.png"  width=700><br>
 <hr>
 Histograms of matched output predictions for training/validation/testing are plotted per foreground class:<br><br><br>
 <img src="assets/hist_example.png"  width=550>
 <hr>
 Input images + ground truth annotations + output predictions of a sampled validation abtch are plotted after each epoch (here 2D sampled slice with +-3 neighbouring context slices in channels):<br><br><br>
 <img src="assets/output_monitoring_1.png"  width=750>
 <hr>
 Zoomed into the last two lines of the plot:<br><br><br>
 <img src="assets/output_monitoring_2.png"  width=700>
 
 
 ## License
 This framework is published under the [Apache License Version 2.0](LICENSE).
 
 
 
 
 
diff --git a/experiments/toy_exp/configs.py b/experiments/toy_exp/configs.py
index 3bfa745..0376e0c 100644
--- a/experiments/toy_exp/configs.py
+++ b/experiments/toy_exp/configs.py
@@ -1,351 +1,351 @@
 #!/usr/bin/env python
 # Copyright 2018 Division of Medical Image Computing, German Cancer Research Center (DKFZ).
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # ==============================================================================
 
 import sys
 import os
 sys.path.append(os.path.dirname(os.path.realpath(__file__)))
 import numpy as np
 from default_configs import DefaultConfigs
 
 class configs(DefaultConfigs):
 
     def __init__(self, server_env=None):
 
         #########################
         #    Preprocessing      #
         #########################
 
         self.root_dir = '/home/gregor/datasets/toy_mdt'
 
         #########################
         #         I/O           #
         #########################
 
         # one out of [2, 3]. dimension the model operates in.
         self.dim = 2
 
         # one out of ['mrcnn', 'retina_net', 'retina_unet', 'detection_unet', 'ufrcnn'].
-        self.model = 'mrcnn'
+        self.model = 'ufrcnn'
 
         DefaultConfigs.__init__(self, self.model, server_env, self.dim)
 
         # int [0 < dataset_size]. select n patients from dataset for prototyping.
         self.select_prototype_subset = None
         self.hold_out_test_set = True
         # including val set. will be 3/4 train, 1/4 val.
         self.n_train_val_data = 2500
 
         # choose one of the 3 toy experiments described in https://arxiv.org/pdf/1811.08661.pdf
         # one of ['donuts_shape', 'donuts_pattern', 'circles_scale'].
         toy_mode = 'donuts_shape_noise'
 
         # path to preprocessed data.
         self.input_df_name = 'info_df.pickle'
         self.pp_name = os.path.join(toy_mode, 'train')
         self.pp_data_path = os.path.join(self.root_dir, self.pp_name)
         self.pp_test_name = os.path.join(toy_mode, 'test')
         self.pp_test_data_path = os.path.join(self.root_dir, self.pp_test_name)
 
         # settings for deployment in cloud.
         if server_env:
             # path to preprocessed data.
             pp_root_dir = '/datasets/datasets_ramien/toy_exp/data'
             self.pp_name = os.path.join(toy_mode, 'train')
             self.pp_data_path = os.path.join(pp_root_dir, self.pp_name)
             self.pp_test_name = os.path.join(toy_mode, 'test')
             self.pp_test_data_path = os.path.join(pp_root_dir, self.pp_test_name)
             self.select_prototype_subset = None
 
         #########################
         #      Data Loader      #
         #########################
 
         # select modalities from preprocessed data
         self.channels = [0]
         self.n_channels = len(self.channels)
 
         # patch_size to be used for training. pre_crop_size is the patch_size before data augmentation.
         self.pre_crop_size_2D = [320, 320]
         self.patch_size_2D = [320, 320]
 
         self.patch_size = self.patch_size_2D if self.dim == 2 else self.patch_size_3D
         self.pre_crop_size = self.pre_crop_size_2D if self.dim == 2 else self.pre_crop_size_3D
 
         # ratio of free sampled batch elements before class balancing is triggered
         # (>0 to include "empty"/background patches.)
         self.batch_sample_slack = 0.2
 
         # set 2D network to operate in 3D images.
         self.merge_2D_to_3D_preds = False
 
         # feed +/- n neighbouring slices into channel dimension. set to None for no context.
         self.n_3D_context = None
         if self.n_3D_context is not None and self.dim == 2:
             self.n_channels *= (self.n_3D_context * 2 + 1)
 
 
         #########################
         #      Architecture      #
         #########################
 
         self.start_filts = 48 if self.dim == 2 else 18
         self.end_filts = self.start_filts * 4 if self.dim == 2 else self.start_filts * 2
         self.res_architecture = 'resnet50' # 'resnet101' , 'resnet50'
-        self.norm = "batch_norm" # one of None, 'instance_norm', 'batch_norm'
-        self.weight_decay = 3e-5
+        self.norm = None # one of None, 'instance_norm', 'batch_norm'
+        self.weight_decay = 1e-6
         self.exclude_from_wd = ("norm",)
 
         # one of 'xavier_uniform', 'xavier_normal', or 'kaiming_normal', None (=default = 'kaiming_uniform')
         self.weight_init = None
 
         #########################
         #  Schedule / Selection #
         #########################
 
-        self.num_epochs = 24
+        self.num_epochs = 28
         self.num_train_batches = 100 if self.dim == 2 else 200
         self.batch_size = 16 if self.dim == 2 else 8
 
         self.do_validation = True
         # decide whether to validate on entire patient volumes (like testing) or sampled patches (like training)
         # the former is morge accurate, while the latter is faster (depending on volume size)
         self.val_mode = 'val_patient' # one of 'val_sampling' , 'val_patient'
         if self.val_mode == 'val_patient':
             self.max_val_patients = None  # if 'None' iterates over entire val_set once.
         if self.val_mode == 'val_sampling':
             self.num_val_batches = 50
 
         # set dynamic_lr_scheduling to True to apply LR scheduling with below settings.
         self.dynamic_lr_scheduling = True
         self.lr_decay_factor = 0.5
-        self.scheduling_patience = np.ceil(3600 / (self.num_train_batches * self.batch_size))
+        self.scheduling_patience = np.ceil(7200 / (self.num_train_batches * self.batch_size))
         self.scheduling_criterion = 'malignant_ap'
         self.scheduling_mode = 'min' if "loss" in self.scheduling_criterion else 'max'
 
         #########################
         #   Testing / Plotting  #
         #########################
 
         # set the top-n-epochs to be saved for temporal averaging in testing.
         self.save_n_models = 5
         self.test_n_epochs = 5
 
         # set a minimum epoch number for saving in case of instabilities in the first phase of training.
         self.min_save_thresh = 0 if self.dim == 2 else 0
 
         self.report_score_level = ['patient', 'rois']  # choose list from 'patient', 'rois'
         self.class_dict = {1: 'benign', 2: 'malignant'}  # 0 is background.
         self.patient_class_of_interest = 2  # patient metrics are only plotted for one class.
         self.ap_match_ious = [0.1]  # list of ious to be evaluated for ap-scoring.
 
         self.model_selection_criteria = ['benign_ap', 'malignant_ap'] # criteria to average over for saving epochs.
         self.min_det_thresh = 0.1  # minimum confidence value to select predictions for evaluation.
 
         # threshold for clustering predictions together (wcs = weighted cluster scoring).
         # needs to be >= the expected overlap of predictions coming from one model (typically NMS threshold).
         # if too high, preds of the same object are separate clusters.
         self.wcs_iou = 1e-5
 
         self.plot_prediction_histograms = True
         self.plot_stat_curves = False
 
         #########################
         #   Data Augmentation   #
         #########################
 
         self.da_kwargs={
         'do_elastic_deform': True,
         'alpha':(0., 1500.),
         'sigma':(30., 50.),
         'do_rotation':True,
         'angle_x': (0., 2 * np.pi),
         'angle_y': (0., 0),
         'angle_z': (0., 0),
         'do_scale': True,
         'scale':(0.8, 1.1),
         'random_crop':False,
         'rand_crop_dist':  (self.patch_size[0] / 2. - 3, self.patch_size[1] / 2. - 3),
         'border_mode_data': 'constant',
         'border_cval_data': 0,
         'order_data': 1
         }
 
         if self.dim == 3:
             self.da_kwargs['do_elastic_deform'] = False
             self.da_kwargs['angle_x'] = (0, 0.0)
             self.da_kwargs['angle_y'] = (0, 0.0) #must be 0!!
             self.da_kwargs['angle_z'] = (0., 2 * np.pi)
 
 
         #########################
         #   Add model specifics #
         #########################
 
         {'detection_unet': self.add_det_unet_configs,
          'mrcnn': self.add_mrcnn_configs,
          'ufrcnn': self.add_mrcnn_configs,
          'ufrcnn_surrounding': self.add_mrcnn_configs,
          'retina_net': self.add_mrcnn_configs,
          'retina_unet': self.add_mrcnn_configs,
          'prob_detector': self.add_mrcnn_configs,
         }[self.model]()
 
 
     def add_det_unet_configs(self):
 
         self.learning_rate = [1e-4] * self.num_epochs
 
         # aggregation from pixel perdiction to object scores (connected component). One of ['max', 'median']
         self.aggregation_operation = 'max'
 
         # max number of roi candidates to identify per image (slice in 2D, volume in 3D)
         self.n_roi_candidates = 3 if self.dim == 2 else 8
 
         # loss mode: either weighted cross entropy ('wce'), batch-wise dice loss ('dice), or the sum of both ('dice_wce')
         self.seg_loss_mode = 'dice_wce'
 
         # if <1, false positive predictions in foreground are penalized less.
         self.fp_dice_weight = 1 if self.dim == 2 else 1
 
         self.wce_weights = [0.3, 1, 1]
         self.detection_min_confidence = self.min_det_thresh
 
         # if 'True', loss distinguishes all classes, else only foreground vs. background (class agnostic).
         self.class_specific_seg_flag = True
         self.num_seg_classes = 3 if self.class_specific_seg_flag else 2
         self.head_classes = self.num_seg_classes
 
     def add_mrcnn_configs(self):
 
         # learning rate is a list with one entry per epoch.
         self.learning_rate = [3e-4] * self.num_epochs
 
         # disable mask head loss. (e.g. if no pixelwise annotations available)
         self.frcnn_mode = False
 
         # disable the re-sampling of mask proposals to original size for speed-up.
         # since evaluation is detection-driven (box-matching) and not instance segmentation-driven (iou-matching),
         # mask-outputs are optional.
         self.return_masks_in_val = True
         self.return_masks_in_test = False
 
         # set number of proposal boxes to plot after each epoch.
         self.n_plot_rpn_props = 0 if self.dim == 2 else 0
 
         # number of classes for head networks: n_foreground_classes + 1 (background)
         self.head_classes = 3
 
         # seg_classes hier refers to the first stage classifier (RPN)
         self.num_seg_classes = 2  # foreground vs. background
 
         # feature map strides per pyramid level are inferred from architecture.
         self.backbone_strides = {'xy': [4, 8, 16, 32], 'z': [1, 2, 4, 8]}
 
         # anchor scales are chosen according to expected object sizes in data set. Default uses only one anchor scale
         # per pyramid level. (outer list are pyramid levels (corresponding to BACKBONE_STRIDES), inner list are scales per level.)
         self.rpn_anchor_scales = {'xy': [[8], [16], [32], [64]], 'z': [[2], [4], [8], [16]]}
 
         # choose which pyramid levels to extract features from: P2: 0, P3: 1, P4: 2, P5: 3.
         self.pyramid_levels = [0, 1, 2, 3]
 
         # number of feature maps in rpn. typically lowered in 3D to save gpu-memory.
         self.n_rpn_features = 512 if self.dim == 2 else 128
 
         # anchor ratios and strides per position in feature maps.
         self.rpn_anchor_ratios = [0.5, 1., 2.]
         self.rpn_anchor_stride = 1
 
         # Threshold for first stage (RPN) non-maximum suppression (NMS):  LOWER == HARDER SELECTION
         self.rpn_nms_threshold = 0.7 if self.dim == 2 else 0.7
 
         # loss sampling settings.
         self.rpn_train_anchors_per_image = 64 #per batch element
         self.train_rois_per_image = 2 #per batch element
         self.roi_positive_ratio = 0.5
         self.anchor_matching_iou = 0.7
 
         # factor of top-k candidates to draw from  per negative sample (stochastic-hard-example-mining).
         # poolsize to draw top-k candidates from will be shem_poolsize * n_negative_samples.
         self.shem_poolsize = 10
 
         self.pool_size = (7, 7) if self.dim == 2 else (7, 7, 3)
         self.mask_pool_size = (14, 14) if self.dim == 2 else (14, 14, 5)
         self.mask_shape = (28, 28) if self.dim == 2 else (28, 28, 10)
 
         self.rpn_bbox_std_dev = np.array([0.1, 0.1, 0.1, 0.2, 0.2, 0.2])
         self.bbox_std_dev = np.array([0.1, 0.1, 0.1, 0.2, 0.2, 0.2])
         self.window = np.array([0, 0, self.patch_size[0], self.patch_size[1]])
         self.scale = np.array([self.patch_size[0], self.patch_size[1], self.patch_size[0], self.patch_size[1]])
 
         if self.dim == 2:
             self.rpn_bbox_std_dev = self.rpn_bbox_std_dev[:4]
             self.bbox_std_dev = self.bbox_std_dev[:4]
             self.window = self.window[:4]
             self.scale = self.scale[:4]
 
         # pre-selection in proposal-layer (stage 1) for NMS-speedup. applied per batch element.
         self.pre_nms_limit = 3000 if self.dim == 2 else 6000
 
         # n_proposals to be selected after NMS per batch element. too high numbers blow up memory if "detect_while_training" is True,
         # since proposals of the entire batch are forwarded through second stage in as one "batch".
         self.roi_chunk_size = 800 if self.dim == 2 else 600
         self.post_nms_rois_training = 500 if self.dim == 2 else 75
         self.post_nms_rois_inference = 500
 
         # Final selection of detections (refine_detections)
         self.model_max_instances_per_batch_element = 10 if self.dim == 2 else 30  # per batch element and class.
         self.detection_nms_threshold = 1e-5  # needs to be > 0, otherwise all predictions are one cluster.
         self.model_min_confidence = 0.1
 
         if self.dim == 2:
             self.backbone_shapes = np.array(
                 [[int(np.ceil(self.patch_size[0] / stride)),
                   int(np.ceil(self.patch_size[1] / stride))]
                  for stride in self.backbone_strides['xy']])
         else:
             self.backbone_shapes = np.array(
                 [[int(np.ceil(self.patch_size[0] / stride)),
                   int(np.ceil(self.patch_size[1] / stride)),
                   int(np.ceil(self.patch_size[2] / stride_z))]
                  for stride, stride_z in zip(self.backbone_strides['xy'], self.backbone_strides['z']
                                              )])
         if self.model == 'ufrcnn':
             self.operate_stride1 = True
             self.class_specific_seg_flag = True
             self.num_seg_classes = 3 if self.class_specific_seg_flag else 2
             self.frcnn_mode = True
 
         if self.model == 'retina_net' or self.model == 'retina_unet' or self.model == 'prob_detector':
             # implement extra anchor-scales according to retina-net publication.
             self.rpn_anchor_scales['xy'] = [[ii[0], ii[0] * (2 ** (1 / 3)), ii[0] * (2 ** (2 / 3))] for ii in
                                             self.rpn_anchor_scales['xy']]
             self.rpn_anchor_scales['z'] = [[ii[0], ii[0] * (2 ** (1 / 3)), ii[0] * (2 ** (2 / 3))] for ii in
                                            self.rpn_anchor_scales['z']]
             self.n_anchors_per_pos = len(self.rpn_anchor_ratios) * 3
 
             self.n_rpn_features = 256 if self.dim == 2 else 64
 
             # pre-selection of detections for NMS-speedup. per entire batch.
             self.pre_nms_limit = 10000 if self.dim == 2 else 50000
 
             # anchor matching iou is lower than in Mask R-CNN according to https://arxiv.org/abs/1708.02002
             self.anchor_matching_iou = 0.5
 
             # if 'True', seg loss distinguishes all classes, else only foreground vs. background (class agnostic).
             self.num_seg_classes = 3 if self.class_specific_seg_flag else 2
 
             if self.model == 'retina_unet':
                 self.operate_stride1 = True
diff --git a/experiments/tutorial.md b/experiments/tutorial.md
new file mode 100644
index 0000000..4bac708
--- /dev/null
+++ b/experiments/tutorial.md
@@ -0,0 +1,100 @@
+# Tutorial
+##### for Including a Dataset into the Framework
+
+## Introduction
+This tutorial aims at providing a muster routine for including a new dataset into the framework in order to 
+use the included models and algorithms with it.\
+The tutorial and toy dataset (under `toy_exp`) are in 2D, yet the switch to 3D is simply made by providing 3D data and proceeding 
+analogically, as can be seen from the provided LIDC scripts (under `lidc_exp`).
+
+Datasets in the framework are set up under `medicaldetectiontoolkit/experiments/<DATASET_NAME>` and
+require three fundamental scripts:
+1. A **preprocessing** script that performs one-time routines on your raw data bringing it into a suitable, easily usable 
+format.
+2. A **data-loading** script (required name `data_loader.py`) that efficiently assembles the preprocessed data into
+network-processable batches.
+3. A **configs** file (`configs.py`) which specifies all settings, from data loading to network architecture. 
+This file is automatically complemented by `default_settings.py` which holds default and dataset-independent settings.
+
+## Preprocessing
+This script (`generate_toys.py` in case of the provided toy dataset, `preprocessing.py` in case of LIDC) is required
+to bring your raw data into an easily usable format. We recommend, you put all one-time processes (like normalization, 
+resampling, cropping, type conversions) into this script in order to avoid the need for repetitive actions during 
+data loading.\
+For the framework usage, we follow a simple workload separation scheme, where network computations
+are performed on the GPU while data loading and augmentations are performed on the CPU. Hence, the framework requires 
+numpy arrays (`.npy`) as input to the networks, therefore your preprocessed data (images and segmentations) should 
+already be in that format. In terms of data dimensions, we follow the scheme: (y, x (,z)), meaning coronal, sagittal, 
+and axial dimensions, respectively.
+
+Class labels for the Regions of Interest (RoIs) need to be provided as lists per data sample.
+If you have segmenation data, you may use the [batchgenerators](https://github.com/MIC-DKFZ/batchgenerators) transform 
+ConvertSegToBoundingBoxCoordinates to generate bounding boxes from your segmentations. In that case, the order of the 
+class labels in the list needs to correspond to the RoI labels in the segmentation.\
+Example: An image (2D or 3D) has two RoIs, one of class 1, the
+other of class 0. In your segmentation, every pixel is 0 (bg), except for the area marking class 1, which has value 1, 
+and the area of class 0, which has value 2. Your list of class labels for this sample should be `[1, 0]`. I.e.,
+the index of the RoI's class label in the sample's label list corresponds to its marking in the segmentation shifted 
+by -1.\
+If you do not have segmentations (only models Faster R-CNN and RetinaNet can be used), you can directly provide bounding
+boxes. In that case, RoIs are simply identified by their indices in the lists: class label list `[cl_of_roi_a, cl_of_roi_b]` 
+corresponds to bbox list `[coords_of_roi_a, coords_of_roi_b]`.
+
+Please store all your light-weight information (patient id, class targets, (relative) paths or identifiers for data and seg) about the
+preprocessed data set in a pandas dataframe, say `info_df.pkl`. 
+
+## Data Loading
+The goal of `data_loader.py` is to sample or iterate, load into CPU RAM, assemble, and eventually augment the preprocessed data.\
+The framework requires the data loader to provide at least a function `get_train_generators`, which yields a dict
+holding a train-data loader under key `"train"` and validation loader under `"val_sampling"` or `"val_patient"`, 
+analogically for `get_test_generator` with `"test"`.\
+We recommend you closely follow our structure as in the provided datasets, which includes a data loader suitable for 
+sampling single patches or parts of the whole patient data with focus on class equilibrium (BatchGenerator,
+used in training and optionally validation) and a PatientIterator which is intended for test and optionally valdiation and
+ iterates through all patients one by one, not discarding 
+any parts of the patient image. In detail, the structure is as follows.
+
+Data loading is performed with the help of the batchgenerators package. Starting from farthest to closest to the 
+preprocessed data, the data loader contains:
+1. Method `get_train_generators` which is called by the execution script and in the end provides train and val data loaders.
+ Same goes for `get_test_generator` for the test loader.
+2. Method `load_dataset` which reads the `info_df.pkl` and provides a dictionary holding, per patient id, paths
+ to images and segmentations, and light-weight info like class targets.
+3. Method `create_data_gen_pipeline` which initiates the train data loader (instance of class BatchGenerator),
+assembles the chosen data-augmentation procedures and passes the BatchGenerator into a MultiThreadedAugmenter (MTA). The MTA
+is a wrapper that manages multi-threaded loading (and augmentation).
+4. Class BatchGenerator. This data loader is used for sampling, e.g., according to the scheme described in 
+`utils/dataloader_utils.get_class_balanced_patients`. It needs to implement a `__next__` method providing the batch; 
+the batch is a dictionary with (at least) keys: `"data"`, `"pid"`, `"class_target"` (as well as `"seg"` if using segmentations).
+    - `"data"` needs to hold your image (2D or 3D) as a numpy array with dimensions: (b, c, y, x(, z)), where b is the 
+    batch dimension (b = batch size), c the channel dimension (if you have multi-modal data c > 1), y, x, z are 
+    the spatial dimensions; z is omitted in case of 2D data.
+    - `"seg"` has the same format as `"data"`, except that its channel dimension has always size c = 1.
+    - `"pid"` is a list of patient or sample identifiers, one per sample, i.e., shape (b,).
+    - `"class_target"` which holds, as mentioned in preprocessing, class labels for the RoIs. It's a list of length b, holding
+    itself lists of varying lengths n_rois(sample). 
+    **Note**: the above description only applies if you use ConvertSegToBoundingBoxCoordinates. Class targets after batch 
+    generation need to make room for a background class (network heads need to be able to predict class 0 = bg). Since, 
+    in preprocessing, we started classes at id 0, we now need to shift them by +1. This is done automatically inside
+    ConvertSegToBoundingBoxCoordinates. That transform also renames `"class_target"` to `"roi_labels"`, which is the label
+    required by the rest of the framework. In case you do not use that transform, please shift and rename the labels
+    in your BatchGenerator.
+5. Class PatientIterator. This data loader is intended for testing and validation. It needs to provide the same output as 
+above BatchGenerator, however, initial batch size is always limited to one (one patient). Output batch size may vary 
+ if patching is applied. Please refer to the LIDC PatientIterator 
+to see how to include patching. Note that this Iterator is not supposed to go through the MTA, transforms (mainly 
+ConvertSegToBoundingBoxCoordinates) therefore need to be applied within this class directly.
+
+
+##Configs
+The current work flow is intended for running multiple experiments with the same dataset but different configs. This is
+done by setting the desired values in `configs.py` in the data set's source directory, then creating an experiment
+via the execution script (`exec.py`, modes "create_exp" or "train" or "train_test"), which copies a snapshot of configs, 
+data loader, default configs, and selected model to the provided experiment directory.
+
+`configs.py` introduces class `configs`, which, when instantiated, inherits settings in `default_configs.py` and adds 
+model-specific settings to itself. Aside from setting all the right input/output paths, you can tune almost anything, from
+network architecture to data-loading settings to train and test routine settings.\
+Furthermore, throughout the whole framework, you have the option to include server-environment specific settings by passing
+argument `--server_env` to the exec script. E.g., in the configs, we use this flag, to overwrite local paths by the
+paths we use on our GPU cluster.  
\ No newline at end of file