diff --git a/inst/appdir/characterizationOfAlgorithmsBootstrapping.Rmd b/inst/appdir/characterizationOfAlgorithmsBootstrapping.Rmd index 6f68a62..617e5a3 100644 --- a/inst/appdir/characterizationOfAlgorithmsBootstrapping.Rmd +++ b/inst/appdir/characterizationOfAlgorithmsBootstrapping.Rmd @@ -1,74 +1,74 @@ ### Ranking stability: Ranking variability via bootstrap approach A blob plot of bootstrap results over the different tasks separated by algorithm allows another perspective on the assessment data. This gives deeper insights into the characteristics of tasks and the ranking uncertainty of the algorithms in each task. \bigskip ```{r blobplot_bootstrap_byAlgorithm,fig.width=7,fig.height = 5} #stabilityByAlgorithm.bootstrap.list if (n.tasks<=6 & n.algorithms<=10 ){ stabilityByAlgorithm(boot_object, ordering=ordering_consensus, max_size = 9, size=4, shape=4, single = F) + scale_color_manual(values=cols) + - guides(color = FALSE) + guides(color = 'none') } else { pl=stabilityByAlgorithm(boot_object, ordering=ordering_consensus, max_size = 9, size=4, shape=4, single = T) for (i in 1:length(pl)) print(pl[[i]] + scale_color_manual(values=cols) + guides(size = guide_legend(title="%"),color="none") ) } ``` \newpage An alternative representation is provided by a stacked frequency plot of the observed ranks, separated by algorithm. Observed ranks across bootstrap samples are displayed with coloring according to the task. For algorithms that achieve the same rank in different tasks for the full assessment data set, vertical lines are on top of each other. Vertical lines allow to compare the achieved rank of each algorithm over different tasks. \bigskip ```{r stackedFrequencies_bootstrap_byAlgorithm,fig.width=7,fig.height = 5} if (n.tasks<=6 & n.algorithms<=10 ){ stabilityByAlgorithm(boot_object, ordering=ordering_consensus, stacked = TRUE, single = F) } else { pl=stabilityByAlgorithm(boot_object, ordering=ordering_consensus, stacked = TRUE, single = T) %++% theme(legend.position = ifelse(n.tasks>20, yes = "bottom", no = "right")) print(pl) } ``` diff --git a/inst/appdir/characterizationOfTasksBootstrapping.Rmd b/inst/appdir/characterizationOfTasksBootstrapping.Rmd index 10fb85c..2ae651d 100644 --- a/inst/appdir/characterizationOfTasksBootstrapping.Rmd +++ b/inst/appdir/characterizationOfTasksBootstrapping.Rmd @@ -1,55 +1,55 @@ ### Visualizing bootstrap results To investigate which tasks separate algorithms well (i.e., lead to a stable ranking), a blob plot is recommended. Bootstrap results can be shown in a blob plot showing one plot for each task. In this view, the spread of the blobs for each algorithm can be compared across tasks. Deviations from the diagonal indicate deviations from the consensus ranking (over tasks). Specifically, if rank distribution of an algorithm is consistently below the diagonal, the algorithm performed better in this task than on average across tasks, while if the rank distribution of an algorithm is consistently above the diagonal, the algorithm performed worse in this task than on average across tasks. At the bottom of each panel, ranks for each algorithm in the tasks are provided. Same as in Section \ref{blobByTask} but now ordered according to consensus. \bigskip ```{r blobplot_bootstrap_byTask,fig.width=9, fig.height=9, results='hide'} #stabilityByTask.bootstrap.list if (n.tasks<=6 & n.algorithms<=10 ){ stabilityByTask(boot_object, ordering=ordering_consensus, max_size = 9, size=4, shape=4) + scale_color_manual(values=cols) + - guides(color = FALSE) + guides(color = 'none') } else { pl=list() for (subt in names(boot_object$bootsrappedRanks)){ a=list(bootsrappedRanks=list(boot_object$bootsrappedRanks[[subt]]), matlist=list(boot_object$matlist[[subt]])) names(a$bootsrappedRanks)=names(a$matlist)=subt class(a)="bootstrap.list" r=boot_object$matlist[[subt]] pl[[subt]]=stabilityByTask(a, max_size = 9, ordering=ordering_consensus, size.ranks=.25*theme_get()$text$size, size=4, shape=4) + scale_color_manual(values=cols) + - guides(color = FALSE) + + guides(color = 'none') + ggtitle(subt)+ theme(legend.position = "bottom") } print(pl) } ``` \ No newline at end of file diff --git a/inst/appdir/visualizationAcrossTasks.Rmd b/inst/appdir/visualizationAcrossTasks.Rmd index 80629d7..19061e1 100644 --- a/inst/appdir/visualizationAcrossTasks.Rmd +++ b/inst/appdir/visualizationAcrossTasks.Rmd @@ -1,119 +1,119 @@ \newpage # Visualization of cross-task insights The algorithms are ordered according to consensus ranking. ## Characterization of algorithms ### Ranking stability: Variability of achieved rankings across tasks Algorithms are color-coded, and the area of each blob at position $\left( A_i, \text{rank } j \right)$ is proportional to the relative frequency $A_i$ achieved rank $j$ across multiple tasks. The median rank for each algorithm is indicated by a black cross. This way, the distribution of ranks across tasks can be intuitively visualized. \bigskip ```{r blobplot_raw,fig.width=9, fig.height=9} #stability.ranked.list stability(object,ordering=ordering_consensus,max_size=9,size=8,shape=4)+ scale_color_manual(values=cols) + - guides(color = FALSE) + guides(color = 'none') ``` ```{r, child=if (isMultiTask && bootstrappingEnabled) system.file("appdir", "characterizationOfAlgorithmsBootstrapping.Rmd", package="challengeR")} ``` \newpage ## Characterization of tasks ```{r, child=if (isMultiTask && bootstrappingEnabled) system.file("appdir", "characterizationOfTasksBootstrapping.Rmd", package="challengeR")} ``` ### Cluster Analysis Dendrogram from hierarchical cluster analysis and \textit{network-type graphs} for assessing the similarity of tasks based on challenge rankings. A dendrogram is a visualization approach based on hierarchical clustering. It depicts clusters according to a chosen distance measure (here: Spearman's footrule) as well as a chosen agglomeration method (here: complete and average agglomeration). \bigskip ```{r dendrogram_complete, fig.width=6, fig.height=5,out.width='60%'} if (n.tasks>2) { dendrogram(object, dist = "symdiff", method="complete") } else cat("\nCluster analysis only sensible if there are >2 tasks.\n\n") ``` \bigskip ```{r dendrogram_average, fig.width=6, fig.height=5,out.width='60%'} if (n.tasks>2) dendrogram(object, dist = "symdiff", method="average") ``` diff --git a/inst/appdir/visualizationBlobPlots.Rmd b/inst/appdir/visualizationBlobPlots.Rmd index 2ab31a1..43ed934 100644 --- a/inst/appdir/visualizationBlobPlots.Rmd +++ b/inst/appdir/visualizationBlobPlots.Rmd @@ -1,41 +1,41 @@ ## *Blob plot* for visualizing ranking stability based on bootstrap sampling \label{blobByTask} Algorithms are color-coded, and the area of each blob at position $\left( A_i, \text{rank } j \right)$ is proportional to the relative frequency $A_i$ achieved rank $j$ across $b=$ `r ncol(boot_object$bootsrappedRanks[[1]])` bootstrap samples. The median rank for each algorithm is indicated by a black cross. 95\% bootstrap intervals across bootstrap samples are indicated by black lines. \bigskip ```{r blobplot_bootstrap,fig.width=9, fig.height=9, results='hide'} showLabelForSingleTask <- FALSE if (n.tasks > 1) { showLabelForSingleTask <- TRUE } pl=list() for (subt in names(boot_object$bootsrappedRanks)){ a=list(bootsrappedRanks=list(boot_object$bootsrappedRanks[[subt]]), matlist=list(boot_object$matlist[[subt]])) names(a$bootsrappedRanks)=names(a$matlist)=subt class(a)="bootstrap.list" r=boot_object$matlist[[subt]] pl[[subt]]=stabilityByTask(a, max_size =8, ordering=rownames(r[order(r$rank),]), size.ranks=.25*theme_get()$text$size, size=8, shape=4, showLabelForSingleTask=showLabelForSingleTask) + scale_color_manual(values=cols) + - guides(color = FALSE) + guides(color = 'none') } # if (length(boot_object$matlist)<=6 &nrow((boot_object$matlist[[1]]))<=10 ){ # ggpubr::ggarrange(plotlist = pl) # } else { print(pl) #} ```