diff --git a/DKFZ_Logo.png b/DKFZ_Logo.png index 12ee1c6..8f07ef8 100644 Binary files a/DKFZ_Logo.png and b/DKFZ_Logo.png differ diff --git a/README.md b/README.md index 4b58fe8..54918b0 100644 --- a/README.md +++ b/README.md @@ -1,539 +1,653 @@ Methods and open-source toolkit for analyzing and visualizing challenge results ================ - [Introduction](#introduction) - [Installation](#installation) - [Terms of use](#terms-of-use) - [Usage](#usage) - [Troubleshooting](#troubleshooting) - [Changes](#changes) - - [Developer team](#developer-team) + - [Team](#team) - [Reference](#reference) -Note that this is ongoing work (version 0.3.1), there may be updates -with possibly major changes. *Please make sure that you use the most -current version\!* +Note that this is ongoing work (version 0.3.3), there may be updates +with possibly major changes. *Please make sure that you use the latest +version\!* -Change log at the end of this document. +The change log can be found in section “Changes”. # Introduction The current framework is a tool for analyzing and visualizing challenge results in the field of biomedical image analysis and beyond. Biomedical challenges have become the de facto standard for benchmarking biomedical image analysis algorithms. While the number of challenges is steadily increasing, surprisingly little effort has been invested in ensuring high quality design, execution and reporting for these international competitions. Specifically, results analysis and visualization in the event of uncertainties have been given almost no attention in the literature. Given these shortcomings, the current framework aims to enable fast and wide adoption of comprehensively analyzing and visualizing the results -of single-task and multi-task challenges and applying them to a number -of simulated and real-life challenges to demonstrate their specific -strengths and weaknesses. This approach offers an intuitive way to gain -important insights into the relative and absolute performance of -algorithms, which cannot be revealed by commonly applied visualization -techniques. +of single-task and multi-task challenges. This approach offers an +intuitive way to gain important insights into the relative and absolute +performance of algorithms, which cannot be revealed by commonly applied +visualization techniques. # Installation Requires R version \>= 3.5.2 (). Further, a recent version of Pandoc (\>= 1.12.3) is required. RStudio () automatically includes this so you do not need to download Pandoc if you plan to use rmarkdown from the RStudio IDE, otherwise you’ll need to install Pandoc for your platform (). Finally, if you want to generate -a pdf report you will need to have LaTeX installed (e.g. MiKTeX, MacTeX +a PDF report you will need to have LaTeX installed (e.g. MiKTeX, MacTeX or TinyTeX). -To get the current development version of the R package from Github: +To get the latest released version (master branch) of the R package from +GitHub: ``` r if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("Rgraphviz", dependencies = TRUE) devtools::install_github("wiesenfa/challengeR", dependencies = TRUE) ``` If you are asked whether you want to update installed packages and you -type “a” for all, you might need administrator rights to update R core -packages. You can also try to type “n” for updating no packages. If you -are asked “Do you want to install from sources the packages which need -compilation? (Yes/no/cancel)”, you can safely type “no”. +type “a” for all, you might need administrator permissions to update R +core packages. You can also try to type “n” for updating no packages. If +you are asked “Do you want to install from sources the packages which +need compilation? (Yes/no/cancel)”, you can safely type “no”. -If you get *Warning messages* (in contrast to *Error* messages), these -might not be problematic and you can try to proceed. +If you get *warning* messages (in contrast to *error* messages), these +might not be problematic and you can try to proceed. If you encounter +errors during the setup, looking into the “Troubleshooting” section +might be worth it. # Terms of use Licenced under GPL-3. If you use this software for a publication, cite Wiesenfarth, M., Reinke, A., Landman, B.A., Cardoso, M.J., Maier-Hein, L. and Kopp-Schneider, A. (2019). Methods and open-source toolkit for analyzing and visualizing challenge results. *arXiv preprint arXiv:1910.05121* # Usage -Each of the following steps have to be run to generate the report: (1) +Each of the following steps has to be run to generate the report: (1) Load package, (2) load data, (3) perform ranking, (4) perform bootstrapping and (5) generation of the report +Here, we provide a step-by-step guide that leads you to your final +report. + ## 1\. Load package Load package ``` r library(challengeR) ``` ## 2\. Load data ### Data requirements -Data requires the following *columns* +Data requires the following *columns*: - - a *task identifier* in case of multi-task challenges. - - a *test case identifier* - - the *algorithm name* - - the *metric value* + - *task identifier* in case of multi-task challenges (string or + numeric) + - *test case identifier* (string or numeric) + - *algorithm identifier* (string or numeric) + - *metric value* (numeric) In case of missing metric values, a missing observation has to be provided (either as blank field or “NA”). For example, in a challenge with 2 tasks, 2 test cases and 2 algorithms, where in task “T2”, test case “case2”, algorithm “A2” didn’t give a prediction (and thus NA or a blank field for missing value is inserted), the data set might look like this: | Task | TestCase | Algorithm | MetricValue | | :--- | :------- | :-------- | ----------: | | T1 | case1 | A1 | 0.266 | | T1 | case1 | A2 | 0.202 | | T1 | case2 | A1 | 0.573 | | T1 | case2 | A2 | 0.945 | | T2 | case1 | A1 | 0.372 | | T2 | case1 | A2 | 0.898 | | T2 | case2 | A1 | 0.908 | | T2 | case2 | A2 | NA | -### Load data +### 2.1 Load data from file If you have assessment data at hand stored in a csv file (if you want to use simulated data skip the following code line) use ``` r data_matrix=read.csv(file.choose()) # type ?read.csv for help ``` This allows to choose a file interactively, otherwise replace *file.choose()* by the file path (in style “/path/to/dataset.csv”) in quotation marks. -For illustration purposes, in the following simulated data is generated -*instead* (skip the following code chunk if you have already loaded -data). The data is also stored as “data\_matrix.csv” in the repository. +### 2.2 Simulate data + +In the following, simulated data is generated *instead* for illustration +purposes (skip the following code chunk if you have already loaded +data). The data is also stored as “inst/extdata/data\_matrix.csv” in the +repository. ``` r if (!requireNamespace("permute", quietly = TRUE)) install.packages("permute") -n=50 +n <- 50 set.seed(4) -strip=runif(n,.9,1) -c_ideal=cbind(task="c_ideal", +strip <- runif(n,.9,1) +c_ideal <- cbind(task="c_ideal", rbind( data.frame(alg_name="A1",value=runif(n,.9,1),case=1:n), data.frame(alg_name="A2",value=runif(n,.8,.89),case=1:n), data.frame(alg_name="A3",value=runif(n,.7,.79),case=1:n), data.frame(alg_name="A4",value=runif(n,.6,.69),case=1:n), data.frame(alg_name="A5",value=runif(n,.5,.59),case=1:n) )) set.seed(1) -c_random=data.frame(task="c_random", +c_random <- data.frame(task="c_random", alg_name=factor(paste0("A",rep(1:5,each=n))), value=plogis(rnorm(5*n,1.5,1)),case=rep(1:n,times=5) ) -strip2=seq(.8,1,length.out=5) -a=permute::allPerms(1:5) -c_worstcase=data.frame(task="c_worstcase", +strip2 <- seq(.8,1,length.out=5) +a <- permute::allPerms(1:5) +c_worstcase <- data.frame(task="c_worstcase", alg_name=c(t(a)), value=rep(strip2,nrow(a)), case=rep(1:nrow(a),each=5) ) -c_worstcase=rbind(c_worstcase, +c_worstcase <- rbind(c_worstcase, data.frame(task="c_worstcase",alg_name=1:5,value=strip2,case=max(c_worstcase$case)+1) ) -c_worstcase$alg_name=factor(c_worstcase$alg_name,labels=paste0("A",1:5)) +c_worstcase$alg_name <- factor(c_worstcase$alg_name,labels=paste0("A",1:5)) -data_matrix=rbind(c_ideal, c_random, c_worstcase) +data_matrix <- rbind(c_ideal, c_random, c_worstcase) ``` -## 3 Perform ranking +## 3\. Perform ranking ### 3.1 Define challenge object -Code differs slightly for single and multi task challenges. +Code differs slightly for single- and multi-task challenges. -In case of a single task challenge use +In case of a single-task challenge use ``` r # Use only task "c_random" in object data_matrix - dataSubset=subset(data_matrix, task=="c_random") - - challenge=as.challenge(dataSubset, - # Specify how to refer to the task in plots and reports - taskName="Task 1", - # Specify which column contains the algorithm, - # which column contains a test case identifier - # and which contains the metric value: - algorithm="alg_name", case="case", value="value", - # Specify if small metric values are better - smallBetter = FALSE) +dataSubset <- subset(data_matrix, task=="c_random") + +challenge <- as.challenge(dataSubset, + # Specify which column contains the algorithms, + # which column contains a test case identifier + # and which contains the metric value: + algorithm = "alg_name", case = "case", value = "value", + # Specify if small metric values are better + smallBetter = FALSE) ``` *Instead*, for a multi-task challenge use ``` r # Same as above but with 'by="task"' where variable "task" contains the task identifier - challenge=as.challenge(data_matrix, - by="task", - algorithm="alg_name", case="case", value="value", - smallBetter = FALSE) +challenge=as.challenge(data_matrix, + by = "task", + algorithm = "alg_name", case = "case", value = "value", + smallBetter = FALSE) ``` -### 3.2 Perform ranking +### 3.2 Configure ranking Different ranking methods are available, choose one of them: - for “aggregate-then-rank” use (here: take mean for aggregation) ``` r -ranking=challenge%>%aggregateThenRank(FUN = mean, # aggregation function, - # e.g. mean, median, min, max, - # or e.g. function(x) quantile(x, probs=0.05) - na.treat=0, # either "na.rm" to remove missing data, - # set missings to numeric value (e.g. 0) - # or specify a function, - # e.g. function(x) min(x) - ties.method = "min" # a character string specifying - # how ties are treated, see ?base::rank - ) +ranking <- challenge%>%aggregateThenRank(FUN = mean, # aggregation function, + # e.g. mean, median, min, max, + # or e.g. function(x) quantile(x, probs=0.05) + na.treat = 0, # either "na.rm" to remove missing data, + # set missings to numeric value (e.g. 0) + # or specify a function, + # e.g. function(x) min(x) + ties.method = "min" # a character string specifying + # how ties are treated, see ?base::rank + ) ``` - *alternatively*, for “rank-then-aggregate” with arguments as above - (here: take mean for aggregation): + (here: take mean for aggregation) ``` r -ranking=challenge%>%rankThenAggregate(FUN = mean, - ties.method = "min" - ) +ranking <- challenge%>%rankThenAggregate(FUN = mean, + ties.method = "min" + ) ``` - *alternatively*, for test-then-rank based on Wilcoxon signed rank - test: + test ``` r -ranking=challenge%>%testThenRank(alpha=0.05, # significance level - p.adjust.method="none", # method for adjustment for - # multiple testing, see ?p.adjust - na.treat=0, # either "na.rm" to remove missing data, - # set missings to numeric value (e.g. 0) - # or specify a function, e.g. function(x) min(x) - ties.method = "min" # a character string specifying - # how ties are treated, see ?base::rank - ) +ranking <- challenge%>%testThenRank(alpha = 0.05, # significance level + p.adjust.method = "none", # method for adjustment for + # multiple testing, see ?p.adjust + na.treat = 0, # either "na.rm" to remove missing data, + # set missings to numeric value (e.g. 0) + # or specify a function, e.g. function(x) min(x) + ties.method = "min" # a character string specifying + # how ties are treated, see ?base::rank + ) ``` ## 4\. Perform bootstrapping Perform bootstrapping with 1000 bootstrap samples using one CPU ``` r set.seed(1) -ranking_bootstrapped=ranking%>%bootstrap(nboot=1000) +ranking_bootstrapped <- ranking%>%bootstrap(nboot = 1000) ``` If you want to use multiple CPUs (here: 8 CPUs), use ``` r library(doParallel) -registerDoParallel(cores=8) +registerDoParallel(cores = 8) set.seed(1) -ranking_bootstrapped=ranking%>%bootstrap(nboot=1000, parallel=TRUE, progress = "none") +ranking_bootstrapped <- ranking%>%bootstrap(nboot = 1000, parallel = TRUE, progress = "none") stopImplicitCluster() ``` ## 5\. Generate the report Generate report in PDF, HTML or DOCX format. Code differs slightly for -single and multi task challenges. +single- and multi-task challenges. -### 5.1 For single task challenges +### 5.1 For single-task challenges ``` r ranking_bootstrapped %>% - report(title="singleTaskChallengeExample", # used for the title of the report + report(title = "singleTaskChallengeExample", # used for the title of the report file = "filename", format = "PDF", # format can be "PDF", "HTML" or "Word" - latex_engine="pdflatex", #LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" - clean=TRUE #optional. Using TRUE will clean intermediate files that are created during rendering. + latex_engine = "pdflatex", #LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" + clean = TRUE #optional. Using TRUE will clean intermediate files that are created during rendering. ) ``` Argument *file* allows for specifying the output file path as well, otherwise the working directory is used. If file is specified but does not have a file extension, an extension will be automatically added according to the output format given in *format*. Using argument *clean=FALSE* allows to retain intermediate files, such as separate files for each figure. If argument “file” is omitted, the report is created in a temporary folder with file name “report”. -### 5.1 For multi task challenges +### 5.2 For multi-task challenges -Same as for single task challenges, but additionally consensus ranking +Same as for single-task challenges, but additionally consensus ranking (rank aggregation across tasks) has to be given. Compute ranking consensus across tasks (here: consensus ranking -according to mean ranks across tasks): +according to mean ranks across tasks) ``` r # See ?relation_consensus for different methods to derive consensus ranking -meanRanks=ranking%>%consensus(method = "euclidean") +meanRanks <- ranking%>%consensus(method = "euclidean") meanRanks # note that there may be ties (i.e. some algorithms have identical mean rank) ``` Generate report as above, but with additional specification of consensus ranking ``` r ranking_bootstrapped %>% - report(consensus=meanRanks, - title="multiTaskChallengeExample", + report(consensus = meanRanks, + title = "multiTaskChallengeExample", file = "filename", format = "PDF", # format can be "PDF", "HTML" or "Word" - latex_engine="pdflatex"#LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" + latex_engine = "pdflatex"#LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" ) ``` # Troubleshooting +In this section we provide an overview of issues that the users reported +and how they were solved. + ### RStudio specific -#### \- Warnings while installing the Github repository +#### \- Warnings while installing the GitHub repository ##### Error: While trying to install the current version of the repository: ``` r devtools::install_github("wiesenfa/challengeR", dependencies = TRUE) ``` -We get this output: +The following warning showed up in the output: ``` r WARNING: Rtools is required to build R packages, but is not currently installed. ``` -I installed Rtools via a separate executable: +Therefore, Rtools was installed via a separate executable: and the warning disappeared. ##### Solution: -We don’t really need Rtools, see comment in the installation section: +Actually there is no need of installing Rtools, it is not really used in +the toolkit. Insted, choose not to install it when it is asked. See +comment in the installation section: “If you are asked whether you want to update installed packages and you type “a” for all, you might need administrator rights to update R core packages. You can also try to type “n” for updating no packages. If you are asked “Do you want to install from sources the packages which need compilation? (Yes/no/cancel)”, you can safely type “no”.” -#### \- Unable to install the current version of the tool from Github +#### \- Unable to install the current version of the tool from GitHub ##### Error: -While trying to install the current version of the tool from github. The -problem was that some of the packages that were built under R3.6.1 were -updated, but the current installed version was still R3.6.1. +While trying the current version of the tool from GitHub, it was unable +to install. The error message was: ``` r byte-compile and prepare package for lazy loading Error: (converted from warning) package 'ggplot2' was built under R version 3.6.3 Execution halted ERROR: lazy loading failed for package 'challengeR' * removing 'C:/Users/.../Documents/R/win-library/3.6/challengeR' * restoring previous 'C:/Users/.../Documents/R/win-library/3.6/challengeR' Error: Failed to install 'challengeR' from GitHub: (converted from warning) installation of package 'C:/Users/.../AppData/Local/Temp/Rtmp615qmV/file4fd419555eb4/challengeR_0.3.1.tar.gz' had non-zero exit status ``` +The problem was that some of the packages that were built under R3.6.1 +had been updated, but the current installed version was still R3.6.1. + ##### Solution: The solution was to update R3.6.1 to R3.6.3. Another way would have been to reset the single packages to the versions built under R3.6.1 +#### \- Unable to install the toolkit from GitHub + +##### Error: + +While trying the current version of the tool from GitHub, it was unable +to install. + +``` r + devtools::install_github("wiesenfa/challengeR", dependencies = TRUE) +``` + +The error message was: + +``` r +Error: .onLoad failed in loadNamespace() for 'pkgload', details: + call: loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) + error: there is no package called ‘backports’ +``` + +The problem was that the packages ‘backports’ had not been installed. + +##### Solution: + +The solution was to install ‘backports’ manually. + +``` r + install.packages("backports") +``` + #### \- Unable to install R ##### Error: While trying to install the package in the R, after running the following commands: ``` r if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("Rgraphviz", dependencies = TRUE) devtools::install_github("wiesenfa/challengeR", dependencies = TRUE) ``` The error message was: ``` r ERROR: 1: In file(con, "r") : URL 'https://bioconductor.org/config.yaml': status was 'SSL connect error' 2: packages ‘BiocVersion’, ‘Rgraphviz’ are not available (for R version 3.6.1) ``` ##### Solution: The solution was to restart RStudio. #### \- Incorrect column order ##### Error: When naming the columns “task” and “case”, R was confused because the arguments in the challenge object are also called like this and it produced the following error: ``` r Error in table(object[[task]][[algorithm]], object[[task]][[case]]) : all arguments must have the same length ``` ##### Solution: The solution was to rename the columns. -### Related to MikText +#### \- Wrong versions of packages + +##### Error: + +While running this command : + +``` r + devtools::install_github("wiesenfa/challengeR", dependencies = TRUE) +``` + +I had the following errors : - Error : the package ‘purrr’ has been +compiled with version of R 3.6.3 - Error : the package ‘ggplot2’ has +been compiled with version of R 3.6.3 - Error in loadNamespace(j \<- +i\[\[L\]\], c(lib.loc, .libPaths()), versionCheck = vI\[\[j\]\]) +namespace ‘glue’ 1.3.1 is already loaded, but \>= 1.3.2 is required + +##### Solution: + +To solve the issue I changed the versions of the packages. I had the +following versions : - purrr 0.3.4 - ggplot2 3.3.2 - glue 1.3.1 + +I moved to the following ones : - purrr 0.3.3 - ggplot2 3.3.0 - glue +1.4.2 + +### Related to MiKTeX #### \- Missing packages ##### Error: -While generating the PDF with Miktext (2.9), produced the following -error: +While generating the PDF with MiKTeX (2.9), the following error showed +up: ``` r fatal pdflatex - gui framework cannot be initialized ``` -There’s an issue with installing missing packages in LaTeX. +There is an issue with installing missing packages in LaTeX. ##### Solution: Open your MiKTeX Console –\> Settings, select “Always install missing packages on-the-fly”. Then generate the report. Once the report is generated, you can reset the settings to your preferred ones. +#### \- Unable to generate report + +##### Error: + +While generating the PDF with MiKTeX (2.9): + +``` r +ranking_bootstrapped %>% + report(title = "singleTaskChallengeExample", # used for the title of the report + file = "filename", + format = "PDF", # format can be "PDF", "HTML" or "Word" + latex_engine = "pdflatex", #LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" + clean = TRUE #optional. Using TRUE will clean intermediate files that are created during rendering. + ) +``` + +The following error showed up: + +``` r +output file: filename.knit.md + +"C:/Program Files/RStudio/bin/pandoc/pandoc" +RTS -K512m -RTS filename.utf8.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output filename.tex --self-contained --number-sections --highlight-style tango --pdf-engine pdflatex --variable graphics --lua-filter "C:/Users/adm/Documents/R/win-library/3.6/rmarkdown/rmd/lua/pagebreak.lua" --lua-filter "C:/Users/adm/Documents/R/win-library/3.6/rmarkdown/rmd/lua/latex-div.lua" --variable "geometry:margin=1in" + +Error: LaTeX failed to compile filename.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. + + Warning message: +In system2(..., stdout = if (use_file_stdout()) f1 else FALSE, stderr = f2) : + '"pdflatex"' not found +``` + +##### Solution: + +The solution was to restart RStudio. + # Changes #### Version 0.3.3 - Force line break to avoid that authors exceed the page in generated PDF reports #### Version 0.3.2 - Correct names of authors #### Version 0.3.1 - Refactoring #### Version 0.3.0 - Major bug fix release #### Version 0.2.5 - Bug fixes #### Version 0.2.4 - Automatic insertion of missings #### Version 0.2.3 - Bug fixes - Reports for subsets (top list) of algorithms: Use e.g. `subset(ranking_bootstrapped, top=3) %>% report(...)` (or `subset(ranking, top=3) %>% report(...)` for report without bootstrap results) to only show the top 3 algorithms according to the chosen ranking methods, where `ranking_bootstrapped` and `ranking` objects as defined in the example. Line plot for ranking robustness can be used to check whether algorithms performing well in other ranking methods are excluded. Bootstrapping still takes - entire uncertainty into account. Podium plot neglect and ranking - heatmap neglect excluded algorithms. Only available for single task - challenges (for mutli task challenges not sensible because each task - would contain a different sets of algorithms). + entire uncertainty into account. Podium plot and ranking heatmap + neglect excluded algorithms. Only available for single-task + challenges (for multi-task challenges not sensible because each task + would contain a different set of algorithms). - Reports for subsets of tasks: Use e.g. `subset(ranking_bootstrapped, - tasks=c("task1", "task2","task3)) %>% report(...)` to restrict + tasks=c("task1", "task2","task3")) %>% report(...)` to restrict report to tasks “task1”, “task2”,"task3. You may want to recompute the consensus ranking before using `meanRanks=subset(ranking, - tasks=c("task1", "task2","task3))%>%consensus(method = "euclidean")` + tasks=c("task1", "task2", "task3"))%>%consensus(method = + "euclidean")` #### Version 0.2.1 - Introduction in reports now mentions e.g. ranking method, number of test cases,… - Function `subset()` allows selection of tasks after bootstrapping, e.g. `subset(ranking_bootstrapped,1:3)` - `report()` functions gain argument `colors` (default: `default_colors`). Change e.g. to `colors=viridisLite::inferno` which “is designed in such a way that it will analytically be perfectly perceptually-uniform, both in regular form and also when converted to black-and-white. It is also designed to be perceived by readers with the most common form of color blindness.” See package `viridis` for further similar functions. #### Version 0.2.0 - Improved layout in case of many algorithms and tasks (while probably still not perfect) - Consistent coloring of algorithms across figures - `report()` function can be applied to ranked object before bootstrapping (and thus excluding figures based on bootstrapping), i.e. in the example `ranking %>% report(...)` - bug fixes -# Developer team +# Team + +The developer team includes members from both division of Computer +Assisted Medical Interventions (CAMI) and Biostatistics at the German +Cancer Research Center (DKFZ): + + - Manuel Wiesenfarth + - Annette Kopp-Schneider + - Annika Reinke + - Matthias Eisenmann + - Laura Aguilera Saiz + - Elise Récéjac + - Lena Maier-Hein # Reference Wiesenfarth, M., Reinke, A., Landman, B.A., Cardoso, M.J., Maier-Hein, L. and Kopp-Schneider, A. (2019). Methods and open-source toolkit for analyzing and visualizing challenge results. *arXiv preprint arXiv:1910.05121* -![alt text](HIP_Logo.png) + + diff --git a/Readme.Rmd b/Readme.Rmd index 20849ce..d5dc669 100644 --- a/Readme.Rmd +++ b/Readme.Rmd @@ -1,545 +1,549 @@ --- -title: Methods and open-source toolkit for analyzing and visualizing challenge results +title: "Methods and open-source toolkit for analyzing and visualizing challenge results" output: - pdf_document: - toc: yes - toc_depth: '3' github_document: toc: yes - toc_depth: 1 + toc_depth: '1' + html_document: + toc: yes + toc_depth: '1' + pdf_document: + toc: yes + toc_depth: '1' editor_options: chunk_output_type: console --- ```{r, echo = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", # fig.path = "README-", fig.width = 9, fig.height = 5, width=160 ) ``` -Note that this is ongoing work (version `r packageVersion("challengeR")`), there may be updates with possibly major changes. *Please make sure that you use the most current version!* +Note that this is ongoing work (version `r packageVersion("challengeR")`), there may be updates with possibly major changes. *Please make sure that you use the latest version!* -Change log at the end of this document. +The change log can be found in section "Changes". # Introduction The current framework is a tool for analyzing and visualizing challenge results in the field of biomedical image analysis and beyond. -Biomedical challenges have become the de facto standard for benchmarking biomedical image analysis algorithms. While the number of challenges is steadily increasing, surprisingly little effort has been invested in ensuring high quality design, execution and reporting for these international competitions. Specifically, results analysis and visualization in the event of uncertainties have been given almost no attention in the literature. +Biomedical challenges have become the de facto standard for benchmarking biomedical image analysis algorithms. While the number of challenges is steadily increasing, surprisingly little effort has been invested in ensuring high quality design, execution and reporting for these international competitions. Specifically, results analysis and visualization in the event of uncertainties have been given almost no attention in the literature. -Given these shortcomings, the current framework aims to enable fast and wide adoption of comprehensively analyzing and visualizing the results of single-task and multi-task challenges and applying them to a number of simulated and real-life challenges to demonstrate their specific strengths and weaknesses. This approach offers an intuitive way to gain important insights into the relative and absolute performance of algorithms, which cannot be revealed by commonly applied visualization techniques. +Given these shortcomings, the current framework aims to enable fast and wide adoption of comprehensively analyzing and visualizing the results of single-task and multi-task challenges. This approach offers an intuitive way to gain important insights into the relative and absolute performance of algorithms, which cannot be revealed by commonly applied visualization techniques. # Installation Requires R version >= 3.5.2 (https://www.r-project.org). -Further, a recent version of Pandoc (>= 1.12.3) is required. RStudio (https://rstudio.com) automatically includes this so you do not need to download Pandoc if you plan to use rmarkdown from the RStudio IDE, otherwise you’ll need to install Pandoc for your platform (https://pandoc.org/installing.html). Finally, if you want to generate a pdf report you will need to have LaTeX installed (e.g. MiKTeX, MacTeX or TinyTeX). +Further, a recent version of Pandoc (>= 1.12.3) is required. RStudio (https://rstudio.com) automatically includes this so you do not need to download Pandoc if you plan to use rmarkdown from the RStudio IDE, otherwise you’ll need to install Pandoc for your platform (https://pandoc.org/installing.html). Finally, if you want to generate a PDF report you will need to have LaTeX installed (e.g. MiKTeX, MacTeX or TinyTeX). -To get the current development version of the R package from Github: +To get the latest released version (master branch) of the R package from GitHub: ```{r, eval=F,R.options,} if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("Rgraphviz", dependencies = TRUE) devtools::install_github("wiesenfa/challengeR", dependencies = TRUE) ``` -If you are asked whether you want to update installed packages and you type "a" for all, you might need administrator rights to update R core packages. You can also try to type "n" for updating no packages. If you are asked "Do you want to install from sources the packages which need compilation? (Yes/no/cancel)", you can safely type "no". +If you are asked whether you want to update installed packages and you type "a" for all, you might need administrator permissions to update R core packages. You can also try to type "n" for updating no packages. If you are asked "Do you want to install from sources the packages which need compilation? (Yes/no/cancel)", you can safely type "no". -If you get *Warning messages* (in contrast to *Error* messages), these might not be problematic and you can try to proceed. +If you get *warning* messages (in contrast to *error* messages), these might not be problematic and you can try to proceed. If you encounter errors during the setup, looking into the "Troubleshooting" section might be worth it. # Terms of use Licenced under GPL-3. If you use this software for a publication, cite Wiesenfarth, M., Reinke, A., Landman, B.A., Cardoso, M.J., Maier-Hein, L. and Kopp-Schneider, A. (2019). Methods and open-source toolkit for analyzing and visualizing challenge results. *arXiv preprint arXiv:1910.05121* # Usage -Each of the following steps have to be run to generate the report: (1) Load package, (2) load data, (3) perform ranking, (4) perform bootstrapping and (5) generation of the report +Each of the following steps has to be run to generate the report: (1) Load package, (2) load data, (3) perform ranking, (4) perform bootstrapping and (5) generation of the report + +Here, we provide a step-by-step guide that leads you to your final report. ## 1. Load package Load package ```{r, eval=F} library(challengeR) ``` ## 2. Load data ### Data requirements -Data requires the following *columns* +Data requires the following *columns*: -* a *task identifier* in case of multi-task challenges. -* a *test case identifier* -* the *algorithm name* -* the *metric value* +* *task identifier* in case of multi-task challenges (string or numeric) +* *test case identifier* (string or numeric) +* *algorithm identifier* (string or numeric) +* *metric value* (numeric) In case of missing metric values, a missing observation has to be provided (either as blank field or "NA"). For example, in a challenge with 2 tasks, 2 test cases and 2 algorithms, where in task "T2", test case "case2", algorithm "A2" didn't give a prediction (and thus NA or a blank field for missing value is inserted), the data set might look like this: ```{r, eval=T, echo=F,results='asis'} set.seed(1) a=cbind(expand.grid(Task=paste0("T",1:2),TestCase=paste0("case",1:2),Algorithm=paste0("A",1:2)),MetricValue=round(c(runif(7,0,1),NA),3)) print(knitr::kable(a[order(a$Task,a$TestCase,a$Algorithm),],row.names=F)) ``` -### Load data +### 2.1 Load data from file If you have assessment data at hand stored in a csv file (if you want to use simulated data skip the following code line) use ```{r, eval=F, echo=T} data_matrix=read.csv(file.choose()) # type ?read.csv for help ``` This allows to choose a file interactively, otherwise replace *file.choose()* by the file path (in style "/path/to/dataset.csv") in quotation marks. +### 2.2 Simulate data - -For illustration purposes, in the following simulated data is generated *instead* (skip the following code chunk if you have already loaded data). The data is also stored as "data_matrix.csv" in the repository. +In the following, simulated data is generated *instead* for illustration purposes (skip the following code chunk if you have already loaded data). The data is also stored as "inst/extdata/data_matrix.csv" in the repository. ```{r, eval=F, echo=T} if (!requireNamespace("permute", quietly = TRUE)) install.packages("permute") -n=50 +n <- 50 set.seed(4) -strip=runif(n,.9,1) -c_ideal=cbind(task="c_ideal", +strip <- runif(n,.9,1) +c_ideal <- cbind(task="c_ideal", rbind( data.frame(alg_name="A1",value=runif(n,.9,1),case=1:n), data.frame(alg_name="A2",value=runif(n,.8,.89),case=1:n), data.frame(alg_name="A3",value=runif(n,.7,.79),case=1:n), data.frame(alg_name="A4",value=runif(n,.6,.69),case=1:n), data.frame(alg_name="A5",value=runif(n,.5,.59),case=1:n) )) set.seed(1) -c_random=data.frame(task="c_random", +c_random <- data.frame(task="c_random", alg_name=factor(paste0("A",rep(1:5,each=n))), value=plogis(rnorm(5*n,1.5,1)),case=rep(1:n,times=5) ) -strip2=seq(.8,1,length.out=5) -a=permute::allPerms(1:5) -c_worstcase=data.frame(task="c_worstcase", +strip2 <- seq(.8,1,length.out=5) +a <- permute::allPerms(1:5) +c_worstcase <- data.frame(task="c_worstcase", alg_name=c(t(a)), value=rep(strip2,nrow(a)), case=rep(1:nrow(a),each=5) ) -c_worstcase=rbind(c_worstcase, +c_worstcase <- rbind(c_worstcase, data.frame(task="c_worstcase",alg_name=1:5,value=strip2,case=max(c_worstcase$case)+1) ) -c_worstcase$alg_name=factor(c_worstcase$alg_name,labels=paste0("A",1:5)) +c_worstcase$alg_name <- factor(c_worstcase$alg_name,labels=paste0("A",1:5)) -data_matrix=rbind(c_ideal, c_random, c_worstcase) +data_matrix <- rbind(c_ideal, c_random, c_worstcase) ``` -## 3 Perform ranking +## 3. Perform ranking ### 3.1 Define challenge object -Code differs slightly for single and multi task challenges. +Code differs slightly for single- and multi-task challenges. -In case of a single task challenge use +In case of a single-task challenge use ```{r, eval=F, echo=T} # Use only task "c_random" in object data_matrix - dataSubset=subset(data_matrix, task=="c_random") - - challenge=as.challenge(dataSubset, - # Specify how to refer to the task in plots and reports - taskName="Task 1", - # Specify which column contains the algorithm, - # which column contains a test case identifier - # and which contains the metric value: - algorithm="alg_name", case="case", value="value", - # Specify if small metric values are better - smallBetter = FALSE) +dataSubset <- subset(data_matrix, task=="c_random") + +challenge <- as.challenge(dataSubset, + # Specify which column contains the algorithms, + # which column contains a test case identifier + # and which contains the metric value: + algorithm = "alg_name", case = "case", value = "value", + # Specify if small metric values are better + smallBetter = FALSE) ``` *Instead*, for a multi-task challenge use ```{r, eval=F, echo=T} # Same as above but with 'by="task"' where variable "task" contains the task identifier - challenge=as.challenge(data_matrix, - by="task", - algorithm="alg_name", case="case", value="value", - smallBetter = FALSE) +challenge=as.challenge(data_matrix, + by = "task", + algorithm = "alg_name", case = "case", value = "value", + smallBetter = FALSE) ``` -### 3.2 Perform ranking +### 3.2 Configure ranking Different ranking methods are available, choose one of them: - for "aggregate-then-rank" use (here: take mean for aggregation) ```{r, eval=F, echo=T} -ranking=challenge%>%aggregateThenRank(FUN = mean, # aggregation function, - # e.g. mean, median, min, max, - # or e.g. function(x) quantile(x, probs=0.05) - na.treat=0, # either "na.rm" to remove missing data, - # set missings to numeric value (e.g. 0) - # or specify a function, - # e.g. function(x) min(x) - ties.method = "min" # a character string specifying - # how ties are treated, see ?base::rank - ) +ranking <- challenge%>%aggregateThenRank(FUN = mean, # aggregation function, + # e.g. mean, median, min, max, + # or e.g. function(x) quantile(x, probs=0.05) + na.treat = 0, # either "na.rm" to remove missing data, + # set missings to numeric value (e.g. 0) + # or specify a function, + # e.g. function(x) min(x) + ties.method = "min" # a character string specifying + # how ties are treated, see ?base::rank + ) ``` -- *alternatively*, for "rank-then-aggregate" with arguments as above (here: take mean for aggregation): +- *alternatively*, for "rank-then-aggregate" with arguments as above (here: take mean for aggregation) ```{r, eval=F, echo=T} -ranking=challenge%>%rankThenAggregate(FUN = mean, - ties.method = "min" - ) +ranking <- challenge%>%rankThenAggregate(FUN = mean, + ties.method = "min" + ) ``` -- *alternatively*, for test-then-rank based on Wilcoxon signed rank test: +- *alternatively*, for test-then-rank based on Wilcoxon signed rank test ```{r, eval=F, echo=T} -ranking=challenge%>%testThenRank(alpha=0.05, # significance level - p.adjust.method="none", # method for adjustment for - # multiple testing, see ?p.adjust - na.treat=0, # either "na.rm" to remove missing data, - # set missings to numeric value (e.g. 0) - # or specify a function, e.g. function(x) min(x) - ties.method = "min" # a character string specifying - # how ties are treated, see ?base::rank - ) +ranking <- challenge%>%testThenRank(alpha = 0.05, # significance level + p.adjust.method = "none", # method for adjustment for + # multiple testing, see ?p.adjust + na.treat = 0, # either "na.rm" to remove missing data, + # set missings to numeric value (e.g. 0) + # or specify a function, e.g. function(x) min(x) + ties.method = "min" # a character string specifying + # how ties are treated, see ?base::rank + ) ``` ## 4. Perform bootstrapping Perform bootstrapping with 1000 bootstrap samples using one CPU ```{r, eval=F, echo=T} set.seed(1) -ranking_bootstrapped=ranking%>%bootstrap(nboot=1000) +ranking_bootstrapped <- ranking%>%bootstrap(nboot = 1000) ``` If you want to use multiple CPUs (here: 8 CPUs), use ```{r, eval=F, echo=T} library(doParallel) -registerDoParallel(cores=8) +registerDoParallel(cores = 8) set.seed(1) -ranking_bootstrapped=ranking%>%bootstrap(nboot=1000, parallel=TRUE, progress = "none") +ranking_bootstrapped <- ranking%>%bootstrap(nboot = 1000, parallel = TRUE, progress = "none") stopImplicitCluster() ``` ## 5. Generate the report -Generate report in PDF, HTML or DOCX format. Code differs slightly for single and multi task challenges. +Generate report in PDF, HTML or DOCX format. Code differs slightly for single- and multi-task challenges. -### 5.1 For single task challenges +### 5.1 For single-task challenges ```{r, eval=F, echo=T} ranking_bootstrapped %>% - report(title="singleTaskChallengeExample", # used for the title of the report + report(title = "singleTaskChallengeExample", # used for the title of the report file = "filename", format = "PDF", # format can be "PDF", "HTML" or "Word" - latex_engine="pdflatex", #LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" - clean=TRUE #optional. Using TRUE will clean intermediate files that are created during rendering. + latex_engine = "pdflatex", #LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" + clean = TRUE #optional. Using TRUE will clean intermediate files that are created during rendering. ) ``` Argument *file* allows for specifying the output file path as well, otherwise the working directory is used. If file is specified but does not have a file extension, an extension will be automatically added according to the output format given in *format*. Using argument *clean=FALSE* allows to retain intermediate files, such as separate files for each figure. If argument "file" is omitted, the report is created in a temporary folder with file name "report". -### 5.1 For multi task challenges -Same as for single task challenges, but additionally consensus ranking (rank aggregation across tasks) has to be given. +### 5.2 For multi-task challenges +Same as for single-task challenges, but additionally consensus ranking (rank aggregation across tasks) has to be given. -Compute ranking consensus across tasks (here: consensus ranking according to mean ranks across tasks): +Compute ranking consensus across tasks (here: consensus ranking according to mean ranks across tasks) ```{r, eval=F, echo=T} # See ?relation_consensus for different methods to derive consensus ranking -meanRanks=ranking%>%consensus(method = "euclidean") +meanRanks <- ranking%>%consensus(method = "euclidean") meanRanks # note that there may be ties (i.e. some algorithms have identical mean rank) ``` Generate report as above, but with additional specification of consensus ranking ```{r, eval=F, echo=T} ranking_bootstrapped %>% - report(consensus=meanRanks, - title="multiTaskChallengeExample", + report(consensus = meanRanks, + title = "multiTaskChallengeExample", file = "filename", format = "PDF", # format can be "PDF", "HTML" or "Word" - latex_engine="pdflatex"#LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" + latex_engine = "pdflatex"#LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" ) ``` # Troubleshooting -In this section are compiled issues that the users reported. +In this section we provide an overview of issues that the users reported and how they were solved. ### RStudio specific -#### - Warnings while installing the Github repository +#### - Warnings while installing the GitHub repository ##### Error: While trying to install the current version of the repository: ```{r, eval=F, echo=T} devtools::install_github("wiesenfa/challengeR", dependencies = TRUE) ``` The following warning showed up in the output: ```{r, eval=F, echo=T} WARNING: Rtools is required to build R packages, but is not currently installed. ``` Therefore, Rtools was installed via a separate executable: https://cran.r-project.org/bin/windows/Rtools/ and the warning disappeared. ##### Solution: Actually there is no need of installing Rtools, it is not really used in the toolkit. Insted, choose not to install it when it is asked. See comment in the installation section: “If you are asked whether you want to update installed packages and you type “a” for all, you might need administrator rights to update R core packages. You can also try to type “n” for updating no packages. If you are asked “Do you want to install from sources the packages which need compilation? (Yes/no/cancel)”, you can safely type “no”.” -#### - Unable to install the current version of the tool from Github +#### - Unable to install the current version of the tool from GitHub ##### Error: -While trying the current version of the tool from github, it was unable to install. +While trying the current version of the tool from GitHub, it was unable to install. The error message was: ```{r, eval=F, echo=T} byte-compile and prepare package for lazy loading Error: (converted from warning) package 'ggplot2' was built under R version 3.6.3 Execution halted ERROR: lazy loading failed for package 'challengeR' * removing 'C:/Users/.../Documents/R/win-library/3.6/challengeR' * restoring previous 'C:/Users/.../Documents/R/win-library/3.6/challengeR' Error: Failed to install 'challengeR' from GitHub: (converted from warning) installation of package 'C:/Users/.../AppData/Local/Temp/Rtmp615qmV/file4fd419555eb4/challengeR_0.3.1.tar.gz' had non-zero exit status ``` The problem was that some of the packages that were built under R3.6.1 had been updated, but the current installed version was still R3.6.1. ##### Solution: The solution was to update R3.6.1 to R3.6.3. Another way would have been to reset the single packages to the versions built under R3.6.1 -#### - Unable to install the toolkit from Github +#### - Unable to install the toolkit from GitHub ##### Error: -While trying the current version of the tool from github, it was unable to install. +While trying the current version of the tool from GitHub, it was unable to install. ```{r, eval=F, echo=T} devtools::install_github("wiesenfa/challengeR", dependencies = TRUE) ``` The error message was: ```{r, eval=F, echo=T} Error: .onLoad failed in loadNamespace() for 'pkgload', details: call: loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) error: there is no package called ‘backports’ ``` The problem was that the packages 'backports' had not been installed. ##### Solution: The solution was to install 'backports' manually. ```{r, eval=F, echo=T} install.packages("backports") ``` #### - Unable to install R ##### Error: While trying to install the package in the R, after running the following commands: ```{r, eval=F, echo=T} if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools") if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("Rgraphviz", dependencies = TRUE) devtools::install_github("wiesenfa/challengeR", dependencies = TRUE) ``` The error message was: ```{r, eval=F, echo=T} ERROR: 1: In file(con, "r") : URL 'https://bioconductor.org/config.yaml': status was 'SSL connect error' 2: packages ‘BiocVersion’, ‘Rgraphviz’ are not available (for R version 3.6.1) ``` ##### Solution: The solution was to restart RStudio. #### - Incorrect column order ##### Error: When naming the columns "task" and "case", R was confused because the arguments in the challenge object are also called like this and it produced the following error: ```{r, eval=F, echo=T} Error in table(object[[task]][[algorithm]], object[[task]][[case]]) : all arguments must have the same length ``` ##### Solution: The solution was to rename the columns. #### - Wrong versions of packages ##### Error: While running this command : ```{r, eval=F, echo=T} devtools::install_github("wiesenfa/challengeR", dependencies = TRUE) ``` I had the following errors : - Error : the package 'purrr' has been compiled with version of R 3.6.3 - Error : the package 'ggplot2' has been compiled with version of R 3.6.3 - Error in loadNamespace(j <- i[[L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) namespace 'glue' 1.3.1 is already loaded, but >= 1.3.2 is required ##### Solution: To solve the issue I changed the versions of the packages. I had the following versions : - purrr 0.3.4 - ggplot2 3.3.2 - glue 1.3.1 I moved to the following ones : - purrr 0.3.3 - ggplot2 3.3.0 - glue 1.4.2 -### Related to MikText +### Related to MiKTeX #### - Missing packages ##### Error: -While generating the PDF with Miktext (2.9), the following error showed up: +While generating the PDF with MiKTeX (2.9), the following error showed up: ```{r, eval=F, echo=T} fatal pdflatex - gui framework cannot be initialized ``` There is an issue with installing missing packages in LaTeX. ##### Solution: Open your MiKTeX Console --> Settings, select "Always install missing packages on-the-fly". Then generate the report. Once the report is generated, you can reset the settings to your preferred ones. #### - Unable to generate report ##### Error: -While generating the PDF with Miktext (2.9): +While generating the PDF with MiKTeX (2.9): ```{r, eval=F, echo=T} ranking_bootstrapped %>% - report(title="singleTaskChallengeExample", # used for the title of the report + report(title = "singleTaskChallengeExample", # used for the title of the report file = "filename", format = "PDF", # format can be "PDF", "HTML" or "Word" - latex_engine="pdflatex", #LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" - clean=TRUE #optional. Using TRUE will clean intermediate files that are created during rendering. + latex_engine = "pdflatex", #LaTeX engine for producing PDF output. Options are "pdflatex", "lualatex", and "xelatex" + clean = TRUE #optional. Using TRUE will clean intermediate files that are created during rendering. ) ``` The following error showed up: ```{r, eval=F, echo=T} output file: filename.knit.md "C:/Program Files/RStudio/bin/pandoc/pandoc" +RTS -K512m -RTS filename.utf8.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output filename.tex --self-contained --number-sections --highlight-style tango --pdf-engine pdflatex --variable graphics --lua-filter "C:/Users/adm/Documents/R/win-library/3.6/rmarkdown/rmd/lua/pagebreak.lua" --lua-filter "C:/Users/adm/Documents/R/win-library/3.6/rmarkdown/rmd/lua/latex-div.lua" --variable "geometry:margin=1in" Error: LaTeX failed to compile filename.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. Warning message: In system2(..., stdout = if (use_file_stdout()) f1 else FALSE, stderr = f2) : '"pdflatex"' not found ``` ##### Solution: The solution was to restart RStudio. # Changes #### Version 0.3.3 - Force line break to avoid that authors exceed the page in generated PDF reports #### Version 0.3.2 - Correct names of authors #### Version 0.3.1 - Refactoring #### Version 0.3.0 - Major bug fix release #### Version 0.2.5 - Bug fixes #### Version 0.2.4 - Automatic insertion of missings #### Version 0.2.3 - Bug fixes - Reports for subsets (top list) of algorithms: Use e.g. `subset(ranking_bootstrapped, top=3) %>% report(...)` (or `subset(ranking, top=3) %>% report(...)` for report without bootstrap results) to only show the top 3 algorithms according to the chosen ranking methods, where `ranking_bootstrapped` and `ranking` objects as defined in the example. Line plot for ranking robustness can be used to check whether algorithms performing well in other ranking methods are excluded. Bootstrapping still takes entire uncertainty into account. Podium plot and ranking heatmap neglect excluded algorithms. Only available for single-task challenges (for multi-task challenges not sensible because each task would contain a different set of algorithms). - Reports for subsets of tasks: Use e.g. `subset(ranking_bootstrapped, tasks=c("task1", "task2","task3")) %>% report(...)` to restrict report to tasks "task1", "task2","task3. You may want to recompute the consensus ranking before using `meanRanks=subset(ranking, tasks=c("task1", "task2", "task3"))%>%consensus(method = "euclidean")` #### Version 0.2.1 - Introduction in reports now mentions e.g. ranking method, number of test cases,... - Function `subset()` allows selection of tasks after bootstrapping, e.g. `subset(ranking_bootstrapped,1:3)` - `report()` functions gain argument `colors` (default: `default_colors`). Change e.g. to `colors=viridisLite::inferno` which "is designed in such a way that it will analytically be perfectly perceptually-uniform, both in regular form and also when converted to black-and-white. It is also designed to be perceived by readers with the most common form of color blindness." See package `viridis` for further similar functions. #### Version 0.2.0 - Improved layout in case of many algorithms and tasks (while probably still not perfect) - Consistent coloring of algorithms across figures - `report()` function can be applied to ranked object before bootstrapping (and thus excluding figures based on bootstrapping), i.e. in the example `ranking %>% report(...)` - bug fixes # Team -The developer team includes members from both Computer Assisted Medical Interventions (CAMI) and Biostatistics Division from the German Cancer Research Center (DKFZ): +The developer team includes members from both division of Computer Assisted Medical Interventions (CAMI) and Biostatistics at the German Cancer Research Center (DKFZ): - Manuel Wiesenfarth - Annette Kopp-Schneider - Annika Reinke - Matthias Eisenmann - Laura Aguilera Saiz +- Elise Récéjac - Lena Maier-Hein # Reference Wiesenfarth, M., Reinke, A., Landman, B.A., Cardoso, M.J., Maier-Hein, L. and Kopp-Schneider, A. (2019). Methods and open-source toolkit for analyzing and visualizing challenge results. *arXiv preprint arXiv:1910.05121* diff --git a/vignettes/Overview.Rmd b/vignettes/Overview.Rmd deleted file mode 100644 index e374a91..0000000 --- a/vignettes/Overview.Rmd +++ /dev/null @@ -1,370 +0,0 @@ ---- -title: "How to use challengeR" -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{How to use challengeR} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>" -) -``` - -# Introduction - -This document is meant to be an overview guide of the classes, methods and different steps used in the tutorial scripts, and aims to achieve a deeper understanding of the analysis and visualization toolkit. The overview is divided in sections, following the usage. - -# Ranking configuration - -Once the data has been loaded (either manually or using a .csv file), the first thing to do is to create a challenge object. Then, the ranking method will be chosen and configured. - -## Define challenge object - -Challenges can be single- or multi-task. We define a challenge task as a subproblem to be solved in the scope of a challenge for which a dedicated ranking/leaderboard is provided (if any). The assessment method (e.g. metric(s) applied) may vary across different tasks of a challenge. For example, a segmentation challenge may comprise three tasks: - -1) segmentation of the liver -2) segmentation of the kidney -3) segmentation of the spleen - -In the context of the visualization toolkit, we differentiate between challenges that only comprise a single-task ("single-task challenge") and challenges with multiple tasks with each task containing different results and rankings ("multi-task challenge"). In the latter case, the report can directly be configured across all specified tasks by defining a task column in the data matrix. - -The first step is to create a challenge object. The file "challengeR.R" will be used for that purpose, which will be now analysed. - -The following code refers to the constructor: -```{r, eval=F, echo=T} -# challengeR.R -as.challenge=function(object, - value, - algorithm , - case=NULL, - taskName=NULL, - by=NULL, - annotator=NULL, - smallBetter=FALSE, - na.treat=NULL, - check=TRUE) -``` - -Each parameter corresponds to: - -- object: the object that will be returned, in the specific case, the data set itself -- value: column corresponding to the values of the metric (only one metric is supported) -- algorithm: column corresponding to the algorithm identifiers -- case: column corresponding to the test case identifier -- taskName: optional task name (string) for single-task challenges, the parameter will be displayed as titles of plots -- by: (="task" ), use it when it is a multi-task challenge. If the parameter is not specified, the challenge will be automatically be interpreted as a single-task challenge. -- annotator: (currently not implemented) specify here if there are more than one annotator -- smallBetter: specify if small metric values are indicating a better performance -- na.treat: (optional) specify how missing values (NA) are treated, e.g. set them to the worst possible metric values. There is no need to specify this value because either if the user knows for sure that the data set has no NAs, or if the data set has NAs and rank-then-aggregate is applied. -- check: computes sanity check if TRUE. The sanity check can be computed for both single- and multi-task challenges. It checks missing algorithm performance, and also whether the test cases appear more than once. - -An example of how to use it (for a multi-task challenge): -```{r, eval=F, echo=T} -# challengeR.R -challenge=as.challenge(data_matrix, - value="value", - algorithm="alg_name", - case="case", - by="task", - smallBetter = FALSE) -``` - -! Take into account that for single-task challenges, the "by" parameter should not be configured ! - -For single-task challenges, if the data matrix consists of a task column, it is easier to create a subset of the data matrix that only includes the values for that specific task: -```{r, eval=F, echo=T} -dataSubset=subset(data_matrix, task=="TASK_NAME") -``` - -In this way, "dataSubset" will be used to create the challenge object. - -## Configure ranking method - -The classes "wrapper.R", "aaggregate.R" and "Rank.aggregated.R" are used. - -In order to configure the ranking methods, the next parameters are considered: - -- FUN: aggregation function, e.g. mean, median, min, max, or e.g. function(x) quantile(x, probs=0.05) -- na.treat: treatment of missing data / null values (here needs to be specified again, because it was an optional parameter when the challenge object was created) either "na.rm" to remove missing data, set missings to numeric value (e.g. 0) or specify a function e.g. function(x) min(x) -- ties.method: a character string specifying how ties (two items that are the same in rank) are treated, see ?base::rank or click on [*Strategies for assigning rankings*](https://en.wikipedia.org/wiki/Ranking#Strategies_for_assigning_rankings) for more details -- alpha: significance level (only for Significance ranking) -- p.adjust.method: method for adjustment for multiple testing, see ?p.adjust - -Different ranking methods are available: - -#### Metric-based aggregation -> aggregateThenRank method - -```{r, eval=F, echo=T} -# wrapper.R -aggregateThenRank=function(object,FUN,ties.method = "min",...){ - object %>% - aggregate(FUN=FUN,...) %>% - rank(ties.method = ties.method) -} -``` - -First, (object %>% aggregate), the metric values for each algorithm are aggregated across all cases using the specified aggregation function: -```{r, eval=F, echo=T} -# aaggregate.R -aggregate.challenge=function(x, - FUN=mean, - na.treat, - alpha=0.05, - p.adjust.method="none", - parallel=FALSE, - progress="none", - ...) -``` - -Second, (aggregate %>% rank), the aggregated metric values are converted into a ranking list, following the smallBetter argument defined above: -```{r, eval=F, echo=T} -# Rank.aggregated.R -rank.aggregated <-function(object, - ties.method="min", - smallBetter, - ...) -``` - -An example for "aggregate-then-rank" use (takink mean for aggregation): - -```{r, eval=F, echo=T} -ranking=challenge%>%aggregateThenRank(FUN = mean, - na.treat=0, - ties.method = "min" - ) -``` - -#### Case-based aggregation -> rankThenAggregate method - -```{r, eval=F, echo=T} -# wrapper.R -rankThenAggregate=function(object, - FUN, - ties.method = "min" - ){ - object %>% - rank(ties.method = ties.method)%>% - aggregate(FUN=FUN) %>% - rank(ties.method = ties.method) -} -``` - -First, (object %>% rank), a ranking will be created for each case across all algorithms. Missing values will be assigned to the worst rank: -```{r, eval=F, echo=T} -# rrank.R -rank.challenge=function(object, - x, - ties.method="min", - ...) -``` - -Second, (rank %>% aggregate), the ranks per case will be aggregated for each algorithm: -```{r, eval=F, echo=T} -# aaggregate.R -aggregate.ranked <-function(x, - FUN=mean, ... ) -``` - -Third, (aggregate %>% rank), the previously ranked and aggregated values are converted to a ranking list again: -```{r, eval=F, echo=T} -# Rank.aggregated.R -rank.aggregated <-function(object, - ties.method="min", - smallBetter, - ...) -``` - -An example for "rank-then-aggregate" with arguments as above (taking mean for aggregation): -```{r, eval=F, echo=T} -ranking=challenge%>%rankThenAggregate(FUN = mean, - ties.method = "min" - ) -``` - - -#### Significance ranking -> testThenRank method - -This method is similar to "aggregateThenRank", but having a fixed "significance" function. - -```{r, eval=F, echo=T} -# wrapper.R -testThenRank=function(object,FUN,ties.method = "min",...){ - object %>% - aggregate(FUN="significance",...) %>% - rank(ties.method = ties.method) -} -``` - -First, (object %>% aggregate),the metric values will be aggregated across all cases. In this case, a pairwise comparison between all algorithms will be performed by using statistical tests. For each algorithm, it will be counted how often the specific algorithm is significantly superior to others. This count will be saved as aggregated value: - -! No need to specify the function again, it is already set as "significance" ! -```{r, eval=F, echo=T} -# aaggregate.R -aggregate.challenge=function(x, - FUN="significance", - na.treat, - alpha=0.05, - p.adjust.method="none", - parallel=FALSE, - progress="none", - ...) -``` - -Second, (aggregate %>% rank), the aggregated values are converted to a ranking list: -```{r, eval=F, echo=T} -# Rank.aggregated.R -rank.aggregated <-function(object, - ties.method="min", - smallBetter, - ...) -``` - -An example for test-then-rank based on Wilcoxon signed rank test: -```{r, eval=F, echo=T} -ranking=challenge%>%testThenRank(alpha=0.05, - p.adjust.method="none", - na.treat=0, - ties.method = "min" - ) - -``` - -# Uncertainty analysis (bootstrapping) - -The assessment of stability of rankings across different ranking methods with respect to both sampling variability and variability across tasks is of major importance. In order to investigate ranking stability, the bootstrap approach can be used for a given method. - -The procedure consists on: - -1. Use available data sets to generate N bootstrap datasets -2. Perform ranking on each bootstrap dataset - -The ranking strategy is performed repeatedly on each bootstrap sample. One bootstrap sample of a task with n test cases consists of n test cases randomly drawn with replacement from this task. A total of b of these bootstrap samples are drawn (e.g., b = 1000). Bootstrap approaches can be evaluated in two ways: either the rankings for each bootstrap sample are evaluated for each algorithm, or the distribution of correlations or pairwise distances between the ranking list based on the full assessment data and based on each bootstrap sample can be explored. - -! Note that this step is optional, can be omitted and directly generate the report. ! - -The following method is used to perform ranking on the generated bootstrap datasets: -```{r, eval=F, echo=T} -# Bootstrap.R -bootstrap.ranked=function(object, - nboot, - parallel=FALSE, - progress="text", - ...) -``` - -- nboot: number of bootstrap datasets to generate -- parallel: TRUE when using multiple CPUs -- progress: when setting it to "text", a progress bar indicating the progress of the bootstrapping is shown - -An example of bootstrapping using multiple CPUs (8 CPUs): - -```{r, eval=F, echo=T} -library(doParallel) -registerDoParallel(cores=8) -set.seed(1) -ranking_bootstrapped=ranking%>%bootstrap(nboot=1000, parallel=TRUE, progress = "none") -stopImplicitCluster() -``` - - -# Report generation - -Finally, the report will be generated. For this last step take into account if the uncertainty analysis was performed or not. - -If the uncertainty analysis was not performed, use: - -```{r, eval=F, echo=T} -# Report.R -report.ranked=function(object, - file, - title="", - colors=default_colors, - format="PDF", - latex_engine="pdflatex", - open=TRUE, - ...) -``` - -If the uncertainty analysis was performed, use: - -```{r, eval=F, echo=T} -# Report.R -report.bootstrap=function(object, - file, - title="", - colors=default_colors, - format="PDF", - latex_engine="pdflatex", - clean=TRUE, - open=TRUE, - ...) -``` - -The report can be generated in different formats: - -- file: name of the output file. If the output path is not specified, the working directory is used. If the file is specified but does not have a file extension, an extension will be automatically added according to the output format given in *format*. If omitted, the report is created in a temporary folder with file name "report". -- title: title of the report -- colors: color coding for the algorithms across all figures. Can be specified. Change e.g. to colors=viridisLite::inferno which is designed in such a way that it will analytically be perfectly perceptually-uniform, both in regular form and also when converted to black-and-white. It is also designed to be perceived by readers with the most common form of color blindness. See package viridis for further similar functions. -- format: output format ("PDF", "HTML" or "Word") -- latex_engine: LaTeX engine for producing PDF output ("pdflatex", "lualatex", "xelatex") -- clean: optional. Using TRUE will clean intermediate files that are created during rendering. Using FALSE allows to retain intermediate files, such as separate files for each figure. -- open: triggers opening of the report after generation or not - -An example of how to generate the report for a *single-task* challenge: - -```{r, eval=F, echo=T} -ranking_bootstrapped %>% - report(title="singleTaskChallengeExample", - file = "filename", - format = "PDF", - latex_engine="pdflatex", - clean=TRUE - ) -``` - -! Note that the code differs slightly for single- and multi-task challenges. ! - -For multi-task challenges consensus ranking (rank aggregation across tasks) has to be given additionally. Consensus relations “synthesize” the information in the elements of a relation ensemble into a single relation, often by minimizing a criterion function measuring how dissimilar consensus candidates are from the (elements of) the ensemble (the so-called “optimization approach”). - -The following method is used: - -```{r, eval=F, echo=T} -# consensus.R -consensus.ranked.list=function(object, - method, - ...) -``` - -- method: consensus ranking method, see ?relation_consensus for different methods to derive consensus ranking. - -An example of computing ranking consensus across tasks, being consensus ranking according to mean ranks across tasks: - -```{r, eval=F, echo=T} -meanRanks=ranking%>%consensus(method = "euclidean") -``` - -Generate report as above, but with additional specification of consensus ranking: - -```{r, eval=F, echo=T} -ranking_bootstrapped %>% - report(consensus=meanRanks, - title="multiTaskChallengeExample", - file = "filename", - format = "PDF", - latex_engine="pdflatex" - ) -``` - -# Features - -- Reports for subsets (top list) of algorithms: Use e.g. `subset(ranking_bootstrapped, top=3) %>% report(...)` (or `subset(ranking, top=3) %>% report(...)` for report without bootstrap results) to only show the top 3 algorithms according to the chosen ranking methods, where `ranking_bootstrapped` and `ranking` objects as defined in the example. Line plot for ranking robustness can be used to check whether algorithms performing well in other ranking methods are excluded. Bootstrapping still takes entire uncertainty into account. Podium plot neglect and ranking heatmap neglect excluded algorithms. Only available for single-task challenges (for mutli task challenges not sensible because each task would contain a different sets of algorithms). -- Reports for subsets of tasks: Use e.g. `subset(ranking_bootstrapped, tasks=c("task1", "task2","task3)) %>% report(...)` to restrict report to tasks "task1", "task2","task3. You may want to recompute the consensus ranking before using `meanRanks=subset(ranking, tasks=c("task1", "task2","task3))%>%consensus(method = "euclidean")` - -# Terms of use -Licenced under GPL-3. If you use this software for a publication, cite - -Wiesenfarth, M., Reinke, A., Landman, B.A., Cardoso, M.J., Maier-Hein, L. and Kopp-Schneider, A. (2019). Methods and open-source toolkit for analyzing and visualizing challenge results. *arXiv preprint arXiv:1910.05121* diff --git a/vignettes/quickstart.Rmd b/vignettes/quickstart.Rmd new file mode 100644 index 0000000..30ba665 --- /dev/null +++ b/vignettes/quickstart.Rmd @@ -0,0 +1,65 @@ +--- +title: "Quickstart" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Quickstart} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +# Introduction + +This tutorial intends to give customized scripts to generate reports quickly, without going through all the installation and usage steps given in the README in detail. + +The tutorial contains the following scripts, which are included in the "vignettes" directory: + +- SingleTask_aggregate-then-rank.R +- MultiTask_rank-then-aggregate.R +- MultiTask_test-then-rank.R + +How to use the tutorial scripts in RStudio: + +1. Specify where the report should be generated. +```{r, eval=F} +setwd("myWorkingDirectoryFilePath") +``` + +2. Open the script. + +3. Click "Source". + +4. The report will be generated in the previously specified working directory. + +5. Check out the report, adapt the script to fit your configuration. + + +# Usage + +Each script contains the following steps, as described in the README: + +1. Load package + +2. Load data (generated randomly) + +3. Perform ranking + +4. Uncertainty analysis (bootstrapping) + +5. Generate report + +The scrips will be now explained in more detail: + +* **SingleTask_aggregate-then-rank.R:** In this script a single-task evaluation will be performed. The applied ranking method is "metric-based aggregation". It begins by aggregating metric values across all test cases for each algorithm. This aggregate is then used to compute a rank for each algorithm. + +* **MultiTask_rank-then-aggregate.R:** In this script a multi-task evaluation will be performed. The applied ranking method is "case-based aggregation". It begins with computing a rank for each test case for each algorithm (”rank first”). The final rank is based on the aggregated test-case ranks. Distance-based approaches for rank aggregation can also be used. + +* **MultiTask_test-then-rank.R:** In this script a multi-task evaluation will be performed. The applied ranking method is "significance ranking". In a complementary approach, statistical hypothesis tests are computed for each possible pair of algorithms to assess differences in metric values between the algorithms. Then ranking is performed according to the resulting relations or according to the number of significant one-sided test results. In the latter case, if algorithms have the same number of significant test results then they obtain the same rank. Various test statistics can be used. + +For more hints, see the README and the package documentation. diff --git a/vignettes/tutorial.Rmd b/vignettes/tutorial.Rmd deleted file mode 100644 index 17e2710..0000000 --- a/vignettes/tutorial.Rmd +++ /dev/null @@ -1,79 +0,0 @@ ---- -title: "Quick-start with challengeR" -output: rmarkdown::html_vignette -vignette: > - %\VignetteIndexEntry{Quick-start with challengeR} - %\VignetteEngine{knitr::rmarkdown} - %\VignetteEncoding{UTF-8} ---- - -```{r, include = FALSE} -knitr::opts_chunk$set( - collapse = TRUE, - comment = "#>" -) -``` - -# Introduction - -This tutorial intends to give customized scripts to generate reports quickly, without going through all the installation and usage steps in detail. - -The tutorial contains the following scripts, which are included in the "vignettes" directory: - -- SingleTask_aggregate-then-rank.R -- MultiTask_rank-then-aggregate.R -- MultiTask_test-then-rank.R - -How to use the tutorial scripts in RStudio: - -1. Specify where the report should be generated. -```{r, eval=F} -setwd("myWorkingDirectoryFilePath") -``` - -2. Open the script. - -3. Select all the text from the script file (CTRL+a), and run all the code (CTRL+enter). - -4. The report will be generated in the previously specified working directory ("myWorkingDirectoryFilePath"). - -5. Check out the report, and the script to modify and adapt the desired parameters. - - -# Usage - -Each script contains the following steps, as described in the README: - -1. Load package - -2. Load data (randomly generated?) - -3. Perform ranking -- Define challenge object -- Perform ranking - -4. Uncertainity analisys (bootstrapping) - -5. Generate report - -The scrips will be now explained in more detail: - -#### SingleTask_aggregate-then-rank.R - -As the name indicates, in this script a single task evaluation will be performed. The applied ranking method is "metric-based aggregation". It is the most commonly applied method, and it begins by aggregating metric values across all test cases for each algorithm. This aggregate is then used to compute a rank for each algorithm. - -#### MultiTask_rank-then-aggregate.R - -As the name indicates, in this script a multi task evaluation will be performed. The applied ranking method is "case-based aggregation". It is the second most commonly applied method, and it begins with computing a rank for each test case for each algorithm (”rank first”). The final rank is based on the aggregated test-case ranks. Distance-based approaches for rank aggregation can also be used. - -#### MultiTask_test-then-rank.R - -As the name indicates, in this script a multi task evaluation will be performed. The applied ranking method is "significance ranking". In a complementary approach, statistical hypothesis tests are computed for each possible pair of algorithms to assess differences in metric values between the algorithms. Then ranking is performed according to the resulting relations or according to the number of significant one-sided test results. In the latter case, if algorithms have the same number of significant test results then they obtain the same rank. Various test statistics can be used. - - -For more hints, see the README. - -# Terms of use -Licenced under GPL-3. If you use this software for a publication, cite - -Wiesenfarth, M., Reinke, A., Landman, B.A., Cardoso, M.J., Maier-Hein, L. and Kopp-Schneider, A. (2019). Methods and open-source toolkit for analyzing and visualizing challenge results. *arXiv preprint arXiv:1910.05121*