diff --git a/inst/appdir/characterizationOfAlgorithmsBootstrapping.Rmd b/inst/appdir/characterizationOfAlgorithmsBootstrapping.Rmd index 6f68a62..617e5a3 100644 --- a/inst/appdir/characterizationOfAlgorithmsBootstrapping.Rmd +++ b/inst/appdir/characterizationOfAlgorithmsBootstrapping.Rmd @@ -1,74 +1,74 @@ ### Ranking stability: Ranking variability via bootstrap approach A blob plot of bootstrap results over the different tasks separated by algorithm allows another perspective on the assessment data. This gives deeper insights into the characteristics of tasks and the ranking uncertainty of the algorithms in each task. <!-- 1000 bootstrap Rankings were performed for each task. --> <!-- Each algorithm is considered separately and for each subtask (x-axis) all observed ranks across bootstrap samples (y-axis) are displayed. Additionally, medians and IQR is shown in black. --> <!-- We see which algorithm is consistently among best, which is consistently among worst, which vary extremely... --> \bigskip ```{r blobplot_bootstrap_byAlgorithm,fig.width=7,fig.height = 5} #stabilityByAlgorithm.bootstrap.list if (n.tasks<=6 & n.algorithms<=10 ){ stabilityByAlgorithm(boot_object, ordering=ordering_consensus, max_size = 9, size=4, shape=4, single = F) + scale_color_manual(values=cols) + - guides(color = FALSE) + guides(color = 'none') } else { pl=stabilityByAlgorithm(boot_object, ordering=ordering_consensus, max_size = 9, size=4, shape=4, single = T) for (i in 1:length(pl)) print(pl[[i]] + scale_color_manual(values=cols) + guides(size = guide_legend(title="%"),color="none") ) } ``` <!-- Stacked frequencies of observed ranks across bootstrap samples are displayed with colouring according to subtask. Vertical lines provide original (non-bootstrap) rankings for each subtask. --> \newpage An alternative representation is provided by a stacked frequency plot of the observed ranks, separated by algorithm. Observed ranks across bootstrap samples are displayed with coloring according to the task. For algorithms that achieve the same rank in different tasks for the full assessment data set, vertical lines are on top of each other. Vertical lines allow to compare the achieved rank of each algorithm over different tasks. \bigskip ```{r stackedFrequencies_bootstrap_byAlgorithm,fig.width=7,fig.height = 5} if (n.tasks<=6 & n.algorithms<=10 ){ stabilityByAlgorithm(boot_object, ordering=ordering_consensus, stacked = TRUE, single = F) } else { pl=stabilityByAlgorithm(boot_object, ordering=ordering_consensus, stacked = TRUE, single = T) %++% theme(legend.position = ifelse(n.tasks>20, yes = "bottom", no = "right")) print(pl) } ``` diff --git a/inst/appdir/characterizationOfTasksBootstrapping.Rmd b/inst/appdir/characterizationOfTasksBootstrapping.Rmd index 10fb85c..2ae651d 100644 --- a/inst/appdir/characterizationOfTasksBootstrapping.Rmd +++ b/inst/appdir/characterizationOfTasksBootstrapping.Rmd @@ -1,55 +1,55 @@ ### Visualizing bootstrap results To investigate which tasks separate algorithms well (i.e., lead to a stable ranking), a blob plot is recommended. Bootstrap results can be shown in a blob plot showing one plot for each task. In this view, the spread of the blobs for each algorithm can be compared across tasks. Deviations from the diagonal indicate deviations from the consensus ranking (over tasks). Specifically, if rank distribution of an algorithm is consistently below the diagonal, the algorithm performed better in this task than on average across tasks, while if the rank distribution of an algorithm is consistently above the diagonal, the algorithm performed worse in this task than on average across tasks. At the bottom of each panel, ranks for each algorithm in the tasks are provided. <!-- Shows which subtask leads to stable ranking and in which subtask ranking is more uncertain. --> Same as in Section \ref{blobByTask} but now ordered according to consensus. \bigskip ```{r blobplot_bootstrap_byTask,fig.width=9, fig.height=9, results='hide'} #stabilityByTask.bootstrap.list if (n.tasks<=6 & n.algorithms<=10 ){ stabilityByTask(boot_object, ordering=ordering_consensus, max_size = 9, size=4, shape=4) + scale_color_manual(values=cols) + - guides(color = FALSE) + guides(color = 'none') } else { pl=list() for (subt in names(boot_object$bootsrappedRanks)){ a=list(bootsrappedRanks=list(boot_object$bootsrappedRanks[[subt]]), matlist=list(boot_object$matlist[[subt]])) names(a$bootsrappedRanks)=names(a$matlist)=subt class(a)="bootstrap.list" r=boot_object$matlist[[subt]] pl[[subt]]=stabilityByTask(a, max_size = 9, ordering=ordering_consensus, size.ranks=.25*theme_get()$text$size, size=4, shape=4) + scale_color_manual(values=cols) + - guides(color = FALSE) + + guides(color = 'none') + ggtitle(subt)+ theme(legend.position = "bottom") } print(pl) } ``` \ No newline at end of file diff --git a/inst/appdir/visualizationAcrossTasks.Rmd b/inst/appdir/visualizationAcrossTasks.Rmd index 80629d7..19061e1 100644 --- a/inst/appdir/visualizationAcrossTasks.Rmd +++ b/inst/appdir/visualizationAcrossTasks.Rmd @@ -1,119 +1,119 @@ \newpage # Visualization of cross-task insights The algorithms are ordered according to consensus ranking. ## Characterization of algorithms ### Ranking stability: Variability of achieved rankings across tasks <!-- Variability of achieved rankings across tasks: If a --> <!-- reasonably large number of tasks is available, a blob plot --> <!-- can be drawn, visualizing the distribution --> <!-- of ranks each algorithm attained across tasks. --> <!-- Displayed are all ranks and their frequencies an algorithm --> <!-- achieved in any task. If all tasks would provide the same --> <!-- stable ranking, narrow intervals around the diagonal would --> <!-- be expected. --> Algorithms are color-coded, and the area of each blob at position $\left( A_i, \text{rank } j \right)$ is proportional to the relative frequency $A_i$ achieved rank $j$ across multiple tasks. The median rank for each algorithm is indicated by a black cross. This way, the distribution of ranks across tasks can be intuitively visualized. \bigskip ```{r blobplot_raw,fig.width=9, fig.height=9} #stability.ranked.list stability(object,ordering=ordering_consensus,max_size=9,size=8,shape=4)+ scale_color_manual(values=cols) + - guides(color = FALSE) + guides(color = 'none') ``` ```{r, child=if (isMultiTask && bootstrappingEnabled) system.file("appdir", "characterizationOfAlgorithmsBootstrapping.Rmd", package="challengeR")} ``` \newpage ## Characterization of tasks ```{r, child=if (isMultiTask && bootstrappingEnabled) system.file("appdir", "characterizationOfTasksBootstrapping.Rmd", package="challengeR")} ``` ### Cluster Analysis <!-- Quite a different question of interest --> <!-- is to investigate the similarity of tasks with respect to their --> <!-- rankings, i.e., which tasks lead to similar ranking lists and the --> <!-- ranking of which tasks are very different. For this question --> <!-- a hierarchical cluster analysis is performed based on the --> <!-- distance between ranking lists. Different distance measures --> <!-- can be used (here: Spearman's footrule distance) --> <!-- as well as different agglomeration methods (here: complete and average). --> Dendrogram from hierarchical cluster analysis and \textit{network-type graphs} for assessing the similarity of tasks based on challenge rankings. A dendrogram is a visualization approach based on hierarchical clustering. It depicts clusters according to a chosen distance measure (here: Spearman's footrule) as well as a chosen agglomeration method (here: complete and average agglomeration). \bigskip ```{r dendrogram_complete, fig.width=6, fig.height=5,out.width='60%'} if (n.tasks>2) { dendrogram(object, dist = "symdiff", method="complete") } else cat("\nCluster analysis only sensible if there are >2 tasks.\n\n") ``` \bigskip ```{r dendrogram_average, fig.width=6, fig.height=5,out.width='60%'} if (n.tasks>2) dendrogram(object, dist = "symdiff", method="average") ``` <!-- In network-type graphs (see Eugster et al, 2008), every task is represented by a node and nodes are connected by edges whose length is determined by a chosen distance measure. Here, distances between nodes are chosen to increase exponentially in Spearman's footrule distance with growth rate 0.05 to accentuate large distances. --> <!-- Hence, tasks that are similar with respect to their algorithm ranking appear closer together than those that are dissimilar. Nodes representing tasks with a unique winner are color-coded by the winning algorithm. In case more than one algorithm ranks first in a task, the corresponding node remains uncolored. --> <!-- \bigskip --> <!-- ```{r ,eval=T,fig.width=12, fig.height=6,include=FALSE, fig.keep="none"} --> <!-- if (n.tasks>2) { --> <!-- netw=network(object, --> <!-- method = "symdiff", --> <!-- edge.col=grDevices::grey.colors, --> <!-- edge.lwd=1, --> <!-- rate=1.05, --> <!-- cols=cols --> <!-- ) --> <!-- plot.new() --> <!-- leg=legend("topright", names(netw$leg.col), lwd = 1, col = netw$leg.col, bg =NA,plot=F,cex=.8) --> <!-- w <- grconvertX(leg$rect$w, to='inches') --> <!-- addy=6+w --> <!-- } else addy=1 --> <!-- ``` --> <!-- ```{r network, fig.width=addy, fig.height=6,out.width='100%',dev=NULL} --> <!-- if (n.tasks>2) { --> <!-- plot(netw, --> <!-- layoutType = "neato", --> <!-- fixedsize=TRUE, --> <!-- # fontsize, --> <!-- # width, --> <!-- # height, --> <!-- shape="ellipse", --> <!-- cex=.8 --> <!-- ) --> <!-- } --> <!-- ``` --> diff --git a/inst/appdir/visualizationBlobPlots.Rmd b/inst/appdir/visualizationBlobPlots.Rmd index 2ab31a1..43ed934 100644 --- a/inst/appdir/visualizationBlobPlots.Rmd +++ b/inst/appdir/visualizationBlobPlots.Rmd @@ -1,41 +1,41 @@ ## *Blob plot* for visualizing ranking stability based on bootstrap sampling \label{blobByTask} Algorithms are color-coded, and the area of each blob at position $\left( A_i, \text{rank } j \right)$ is proportional to the relative frequency $A_i$ achieved rank $j$ across $b=$ `r ncol(boot_object$bootsrappedRanks[[1]])` bootstrap samples. The median rank for each algorithm is indicated by a black cross. 95\% bootstrap intervals across bootstrap samples are indicated by black lines. \bigskip ```{r blobplot_bootstrap,fig.width=9, fig.height=9, results='hide'} showLabelForSingleTask <- FALSE if (n.tasks > 1) { showLabelForSingleTask <- TRUE } pl=list() for (subt in names(boot_object$bootsrappedRanks)){ a=list(bootsrappedRanks=list(boot_object$bootsrappedRanks[[subt]]), matlist=list(boot_object$matlist[[subt]])) names(a$bootsrappedRanks)=names(a$matlist)=subt class(a)="bootstrap.list" r=boot_object$matlist[[subt]] pl[[subt]]=stabilityByTask(a, max_size =8, ordering=rownames(r[order(r$rank),]), size.ranks=.25*theme_get()$text$size, size=8, shape=4, showLabelForSingleTask=showLabelForSingleTask) + scale_color_manual(values=cols) + - guides(color = FALSE) + guides(color = 'none') } # if (length(boot_object$matlist)<=6 &nrow((boot_object$matlist[[1]]))<=10 ){ # ggpubr::ggarrange(plotlist = pl) # } else { print(pl) #} ```