Methods and open-source toolkit for analyzing and visualizing challenge results
Details
Fri, Jun 17
Warning messages when there are missing values in the data were reviewed as below:
Tue, Jun 14
Hey everyone,
Thu, Jun 9
I like the results when the scales library is used! However, when we find a way to bring back the confidence intervals, also @wiesenfa's latest solution can be used.
Tue, Jun 7
Hey everyone,
I added tests in current future branch for checking class of “algorithm” column in challenge object.
Jun 3 2022
First, I tried the fix with R 3.6 and can confirm that it does not break the functionality there.
May 30 2022
I have added object[[algorithm]] <- as.factor(object[[algorithm]]) to challengeR.R as you suggested. Now everything works without any problem. No need of stating stringsAsFactors anymore during CSV read.
May 23 2022
Thank you so much @aekavur ! It helps a lot to understand the reason finally!
May 22 2022
Hi again :)
May 16 2022
if the output is NULL, object[[by]] is not a factor, i.e. class(object[[by]]) is "character", in this case you need to use use unique() and probably your solution
Hi again,
May 13 2022
Thanks Emre!
Thats a weird change. I didn't find any mention in R changelog.
probably instead of
algorithms=factor(unique(object[[by]]))
it will be preferred
Congrats for tracing this down!
Finally, I could find the source of the bug. 😊 It is caused by changed output type of unique() function in R:Base from R-3 to R-4.
Feb 28 2022
I am sharing my current test code with artificial data. Since there can be 4-5 blob plots in the report (depending on data, task number), I need to prepare a new test code for only blob plots. Until that, you may use the code I am sharing.
thanks Emre. that's problematic, confidence intervals are missing. Could you share a code file for testing with artificial data (ideally not with the report as output but the plot itself)? Then I will try to look into it. or is this difficult for you?
I have tried this approach. I just needed to remove minor_breaks=NULL, line since there is no such a config in R/scale-discrete-.r
Feb 24 2022
I think the solution is to consider rank not as continuous but a factor (essentially a string)
That means first following
Feb 21 2022
THanks Emre! This sounds like a lot of effort. Please give me some time to have a look at it
I have tried many configurations just to force ggplot2 to start y-axis labels from "1" when choosing automatic scaling. However, it was not possible :/
Feb 14 2022
I guess overall it's a matter of taste.
Fully automatic one has several problems: in case of the 30 algorithms, scale starts with 0 which is not sensible. I'm not sure what happens with something like 27 or 17 algorithms (a number which doesn't divide by 5). in case of the 7 alogirhtms it starts with 2 which I find a bit weird, I would expect a scale starting with 1. Thus, I would at least include the limits=c(1,max(...)) argument which however as said before may lead to sequences like 1,7,13,... but maybe this is not so much of a problem.
Let's try the automatic config of ggplot :)
If I remember correctly this didn't work layout-wise for large number of algorithms. Numbers will either overlap or need to get very small/size of figure will need to be increased.
try to test with something like 20 algorithms, how does the report look then?
what's the problem with 1,5,10,15,18? the scale isn't affected, so for me it wouldn't matter that it's not the same intervals. in principle you could also omit the 18, i.e. only 1,5,10,15. Instead of all integers, I would rather use the automatic choice.
I agree with you. On the other hand, putting breaks according to a defined integer can be tricky. For example, let's assume that we have decided to define breaks on every 5th element. The y-axis will be a 1,5,10,15,18 for a challenge with 18 algorithms. The last portion of the sequence will have a different period. Therefore, I offer including all integer breaks for the [1, #algorithms] range. I am putting some examples here:
Feb 11 2022
not sure whether this is a good idea. imagine a challenge with 18 algorithms. there will be only a 1 and an 18 and nothing in between, this may make it difficult to read. what do you think?
I have tried suggested codes but they did not fix the problem. Besides, there caused additional issues. :)
Could you try to replace "breaks" by "labels" in
I have tested this with the provided data. The scaling of the y-axis seems to be correct now. But only the first rank is labeled on the y-axis. Can the other ranks be labeled as well?
Feb 8 2022
The problem is almost fixed by giving na.treat parameter in both as.challenge and ranking methods (except rankThenAggregate). Now we can generate reports for all ranking methods.
Feb 7 2022
scale_y_continuous functions inside ./R/Stability.R file were modified. The problem seems solved. You can test it feature/T28966-YaxisOfBlobPlotsAlwaysScaledTo5 branch via the file at the attachment. (You can run it root folder of the challengeR code)
I guess na.treat it is only needed for the line plot for comparing to other ranking methods?
In this case, a message could be thrown when compiling the report saying something like "line plot comparing ranking methods omitted since na.treat is not specified. Specify na.treat in as.challenge() if inclusion of line plot is desired" and allow compilation of the report (excluding line plot).
(Note that you can define na.treat both in as.challenge() as well as in the ranking functions).
Jan 28 2022
Thank you for investigating this! In challengeR it is covered in the way that a message is emitted saying "na.treat obligatory if report is intended to be compiled". In order to solve the mentioned issue 2, a strategy for the preferred way to handle it in VISSART should be defined. Should the user be guided to specify the NaN handling strategy? Should the user be able to generate a report but without the plots that require numeric values?
Current status of the issue: