diff --git a/tutorial/Overview.Rmd b/tutorial/Overview.Rmd index 5246a88..8aae7da 100644 --- a/tutorial/Overview.Rmd +++ b/tutorial/Overview.Rmd @@ -1,306 +1,362 @@ --- title: Overwiew of the used methods output: github_document: toc: yes toc_depth: 1 pdf_document: toc: yes toc_depth: '3' editor_options: chunk_output_type: console --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", # fig.path = "README-", fig.width = 9, fig.height = 5, width=160 ) ``` # Introduction -This document is meant to be an overview guide of the methods and different steps used in the tutorial scripts, and aims to achieve a deeper understanding of the analysis and visualization toolkit. The overview is divided in sections, following the usage. +This document is meant to be an overview guide of the classes, methods and different steps used in the tutorial scripts, and aims to achieve a deeper understanding of the analysis and visualization toolkit. The overview is divided in sections, following the usage. # Ranking configuration Once the data has been loaded (either manually or using a .csv file), the first thing to do is to create a challenge object. Then, the ranking method will be chosen and configured. ## Define challenge object +Challenges can be single or multi-task. (add an explanation of what is meant by single task and multitask and why we are differentiating between them--> Annika) + A challenge object will be created using "challengeR.R" class, which will be now analysed. The following code refers to the constructor: ```{r, eval=F, echo=T} as.challenge=function(object, value, algorithm , case=NULL, by=NULL, annotator=NULL, smallBetter=FALSE, na.treat=NULL, # optional check=TRUE) ``` Each parameter corresponds to: - object: the object that will be returned, in the specific case, challenge - value: column corresponding to the values of the metrics - algorithm: column corresponding to the algorithm - case: column corresponding to the test case identifier -- by: (="task" ), use it when it is a multitask challenge -- annotator: specify here if there are more than one annotator -- smallBetter: specify if small metric values are better -- na.treat: treatment of empty values (NA) +- by: (="task" ), use it when it is a multi-task challenge. If the parameter is not specified, the challenge will be automatically be interpreted as a single task challenge. +- annotator: (currently not implemented) specify here if there are more than one annotator +- smallBetter: specify if small metric values will lead to a better performance +- na.treat: specify how missing values (NA) are treated, e.g. set them to the worst possible metric values - check: computes sanity check if TRUE. The sanity check can be computed for both single and multi-task challenges. It checks missing algorithm performance, and also wether the test cases appear more than once. An example of how to use it (for a multi-task challenge): ```{r, eval=F, echo=T} challenge=as.challenge(data_matrix, value="value", algorithm="alg_name", case="case", by="task", smallBetter = FALSE) ``` -! Take into account that the code differst for multi/single task challenges ! - -In case of a single task challenge, create first a dataSubset, where the name of the task is unique. +! Take into account that the code differs for single/multi-task challenges ! +For single task challenges, if the data matrix consists of a task column, it is easier to create a subset of the data matrix that only includes the values for that specific task: ```{r, eval=F, echo=T} - dataSubset=subset(data_matrix, task=="c_random") + dataSubset=subset(data_matrix, task=="TASK_NAME") ``` -dataSubset will be used to create the challenge object then. +In this way, "dataSubset" will be used to create the challenge object. ## Configure ranking method -The classes "wrapper.R", "aaggregate.R" and "Rank.aggregated.R" (?) are used. +The classes "wrapper.R", "aaggregate.R" and "Rank.aggregated.R" are used. In order to configure the ranking methods, the next parameters are considered: - FUN: aggregation function, e.g. mean, median, min, max, or e.g. function(x) quantile(x, probs=0.05) - na.treat: treatment of`missing data / null values (already specified when the challenge object was created, do we need to specify here again?) either "na.rm" to remove missing data, set missings to numeric value (e.g. 0) or specify a function e.g. function(x) min(x) -- ties.method: a character string specifying how ties are treated, see ?base::rank +- ties.method: a character string specifying how ties (two items that are the same in rank) are treated, see ?base::rank [Strategies for assigning rankings](https://en.wikipedia.org/wiki/Ranking#Strategies_for_assigning_rankings) - alpha: significance level (only for Significance ranking) - p.adjust.method: method for adjustment for multiple testing, see ?p.adjust Different ranking methods are available: #### Metric-based aggregation -> aggregateThenRank method ```{r, eval=F, echo=T} # wrapper.R aggregateThenRank=function(object,FUN,ties.method = "min",...){ object %>% aggregate(FUN=FUN,...) %>% rank(ties.method = ties.method) } ``` -First, (object %<% aggregate), the challenge object is aggregated: +First, (object %>% aggregate), the metric values for each algorithm are aggregated across all cases using the specified aggregation function: ```{r, eval=F, echo=T} # aaggregate.R aggregate.challenge=function(x, FUN=mean, - na.treat, #either "na.rm", numeric value or function - alpha=0.05, p.adjust.method="none",# only needed for significance + na.treat, + alpha=0.05, + p.adjust.method="none", parallel=FALSE, - progress="none",...) + progress="none", + ...) ``` -Second, (aggregate %<% rank), the aggregated challenge is ranked: +Second, (aggregate %>% rank), the aggregated metric values are converted into a ranking list, following the smallBetter argument defined above: ```{r, eval=F, echo=T} # Rank.aggregated.R rank.aggregated <-function(object, ties.method="min", largeBetter, ...) ``` An example for "aggregate-then-rank" use (takink mean for aggregation): ```{r, eval=F, echo=T} -ranking=challenge%>%aggregateThenRank(FUN = mean, # aggregation function, - # e.g. mean, median, min, max, - # or e.g. function(x) quantile(x, probs=0.05) - na.treat=0, # either "na.rm" to remove missing data, - # set missings to numeric value (e.g. 0) - # or specify a function, - # e.g. function(x) min(x) - ties.method = "min" # a character string specifying - # how ties are treated, see ?base::rank - ) +ranking=challenge%>%aggregateThenRank(FUN = mean, + na.treat=0, + ties.method = "min" + ) ``` #### Case-based aggregation -> rankThenAggregate method ```{r, eval=F, echo=T} # wrapper.R rankThenAggregate=function(object, FUN, ties.method = "min" ){ object %>% rank(ties.method = ties.method)%>% aggregate(FUN=FUN) %>% - rank(ties.method = ties.method) #small rank is always best, i.e. largeBetter always FALSE + rank(ties.method = ties.method) } ``` -First, (object %<% rank), the challenge object is ranked: +First, (object %>% rank), a ranking will be created for each case across all algorithms. Missing values can be set to the last rank: ```{r, eval=F, echo=T} # rrank.R rank.challenge=function(object, x, - ties.method="min",...) + ties.method="min", + ...) ``` -Second, (rank %<% aggregate), the ranked challenge object is aggregated: +Second, (rank %>% aggregate), the ranks per case will be aggregated for each algorithm: ```{r, eval=F, echo=T} # aaggregate.R aggregate.ranked <-function(x, FUN=mean, ... ) ``` -Third, (aggregate %<% rank), the previously ranked and aggregated challenge is again ranked: +Third, (aggregate %>% rank), the previously ranked and aggregated values are converted to a ranking list again: ```{r, eval=F, echo=T} # Rank.aggregated.R rank.aggregated <-function(object, ties.method="min", largeBetter, ...) ``` An example for "rank-then-aggregate" with arguments as above (taking mean for aggregation): ```{r, eval=F, echo=T} ranking=challenge%>%rankThenAggregate(FUN = mean, ties.method = "min" ) ``` #### Significance ranking -> testThenRank method This method is similar to "aggregateThenRank", but having a fixed "significance" function. ```{r, eval=F, echo=T} # wrapper.R testThenRank=function(object,FUN,ties.method = "min",...){ object %>% aggregate(FUN="significance",...) %>% rank(ties.method = ties.method) } ``` -First, (object %<% aggregate), the challenge object is aggregated: +First, (object %>% aggregate),the metric values will be aggregated across all cases. In this case, a pairwise comparison between all algorithms will be performed by using statistical tests. For each algorithm, it will be counted how often the specific algorithm is significantly superior to others. This count will be saved as aggregated value: -! No need to specify the function again ! +! No need to specify the function again, it is already set as "significance" ! ```{r, eval=F, echo=T} # aaggregate.R aggregate.challenge=function(x, FUN="significance", - na.treat, #either "na.rm", numeric value or function - alpha=0.05, p.adjust.method="none",# only needed for significance + na.treat, + alpha=0.05, + p.adjust.method="none", parallel=FALSE, - progress="none",...) + progress="none", + ...) ``` -Second, (aggregate %<% rank), the aggregated challenge is ranked: +Second, (aggregate %>% rank), the aggregated values are converted to a ranking list: ```{r, eval=F, echo=T} # Rank.aggregated.R rank.aggregated <-function(object, ties.method="min", largeBetter, ...) ``` An example for test-then-rank based on Wilcoxon signed rank test: ```{r, eval=F, echo=T} -ranking=challenge%>%testThenRank(alpha=0.05, # significance level - p.adjust.method="none", # method for adjustment for - # multiple testing, see ?p.adjust - na.treat=0, # either "na.rm" to remove missing data, - # set missings to numeric value (e.g. 0) - # or specify a function, e.g. function(x) min(x) - ties.method = "min" # a character string specifying - # how ties are treated, see ?base::rank - ) +ranking=challenge%>%testThenRank(alpha=0.05, + p.adjust.method="none", + na.treat=0, + ties.method = "min" + ) ``` - +# Uncertainity analysis (bootstrapping) + +The assessment of stability of rankings across different ranking methods with respect to both sampling variability and variability across tasks is of major importance. In order to investigate ranking stability, the bootstrap approach can be used for a given method. - - - - - +The procedure consists on: - +1. Use available data sets to generate N bootstrap datasets +2. Perform rankig on each bootstrap dataset - - - - - - - +The ranking strategy is performed repeatedly on each bootstrap sample. One bootstrap sample of a task with n test cases consists of n test cases randomly drawn with replacement from this task. A total of b of these bootstrap samples are drawn (e.g., b = 1000). Bootstrap approaches can be evaluated in two ways: either the rankings for each bootstrap sample are evaluated for each algorithm, or the distribution of correlations or pairwise distances between the ranking list based on the full assessment data and based on each bootstrap sample can be explored. +! Note that this step is optional, can be ommited and directly generate the report. ! +The following method is used to perform ranking on the generated bootstrap datasets: +```{r, eval=F, echo=T} +# Bootstrap.R +bootstrap.ranked=function(object, + nboot, + parallel=FALSE, + progress="text", + ...) +``` - - +- nboot: number of bootstrap datasets to generate +- parallel: TRUE when using multiple CPUs +- progress: defines if the progress will be reported and how (?) - - - - - - - - - +An example of bootstrapping using multiple CPUs (8 CPUs): - +```{r, eval=F, echo=T} +library(doParallel) +registerDoParallel(cores=8) +set.seed(1) +ranking_bootstrapped=ranking%>%bootstrap(nboot=1000, parallel=TRUE, progress = "none") +stopImplicitCluster() +``` - - - - +# Report generation +Finally, the report will be generated. For this last step take into account if the uncertainity analysis was performed or not. - - +If the uncertainity analysis was not performed, use: - +```{r, eval=F, echo=T} +# Report.R +report.ranked=function(object, + file, + title="", + colors=default_colors, + format="PDF", + latex_engine="pdflatex", + open=TRUE, + ...) +``` + +If the uncertainity analysis was performed, use: - - - - - +```{r, eval=F, echo=T} +# Report.R +report.bootstrap=function(object, + file, + title="", + colors=default_colors, + format="PDF", + latex_engine="pdflatex", + open=TRUE, + ...) +``` + +The report can be generated in different formats: + +- file: name of the output file. If the output path is not specified, the working directory is used. If file is specified but does not have a file extension, an extension will be automatically added according to the output format given in *format*. If omitted, the report is created in a temporary folder with file name "report". +- title: title of the report +- colors: colors of the graphics (?) +- format: output format ("PDF", "HTML" or "Word") +- latex_engine: LaTeX engine for producing PDF output ("pdflatex", "lualatex", "xelatex") +- open: optional. Using TRUE will clean intermediate files that are created during rendering. Using FALSE allows to retain intermediate files, such as separate files for each figure. + +An example of how to generate the report for a *single task* challenge: - - - - - - - - - - +```{r, eval=F, echo=T} +ranking_bootstrapped %>% + report(title="singleTaskChallengeExample", + file = "filename", + format = "PDF", + latex_engine="pdflatex", + clean=TRUE + ) +``` + +! Note that the code differs slightly for single and multi task challenges. ! + +For multi task challenges consensus ranking (rank aggregation across tasks) has to be given additionally. Consensus relations “synthesize” the information in the elements of a relation ensemble into a single relation, often by minimizing a criterion function measuring how dissimilar consensus candidates are from the (elements of) the ensemble (the so-called “optimization approach”). + +The following method is used: + +```{r, eval=F, echo=T} +# consensus.R +consensus.ranked.list=function(object, + method, + ...) +``` + +- method: consensus ranking method, see ?relation_consensus for different methods to derive consensus ranking. + +An example of computing ranking consensus across tasks, being consensus ranking according to mean ranks across tasks: + +```{r, eval=F, echo=T} +meanRanks=ranking%>%consensus(method = "euclidean") +``` + +Generate report as above, but with additional specification of consensus ranking: + +```{r, eval=F, echo=T} +ranking_bootstrapped %>% + report(consensus=meanRanks, + title="multiTaskChallengeExample", + file = "filename", + format = "PDF", + latex_engine="pdflatex" + ) +``` # Terms of use Licenced under GPL-3. If you use this software for a publication, cite Wiesenfarth, M., Reinke, A., Landman, B.A., Cardoso, M.J., Maier-Hein, L. and Kopp-Schneider, A. (2019). Methods and open-source toolkit for analyzing and visualizing challenge results. *arXiv preprint arXiv:1910.05121*