Methods and open-source toolkit for analyzing and visualizing challenge results
Details
Aug 11 2023
Aug 10 2023
I introduced the doRNG package to ensure reproducibility on Windows.
Aug 9 2023
Mar 31 2023
Or we just forbid parallelization with windows... Parallelization of R in Windows is such a series of workarounds....
Great @wiesenfa! The test with doRNG passed on Windows and Ubuntu!
using doRNG might be the best version should work on any OS
Oh I HATE it!
Could you please try (first installing package "doRNG" https://cran.r-project.org/web/packages/doRNG/index.html ):
Mar 10 2023
Could someone please try on Windows
oh I hate it so much. I know the problem, only Windows is affected. Parallelization does not work with forking there, I keep forgetting this. I'll look for a solution on windows
I tested it with R 4.2.0 on Windows system and got the same error. I stopped the test and looked rankingBootstrapped1 and rankingBootstrapped2. Here are the screenshots:
Feb 23 2023
I implemented Manuel's suggestions in branch hotfix/T29361-EnsureReproducibilityWithParallelBootstrapping and added corresponding unit tests to test-bootstrap.R.
Oct 13 2022
Jun 17 2022
Warning messages when there are missing values in the data were reviewed as below:
Jun 14 2022
Hey everyone,
Jun 9 2022
I like the results when the scales library is used! However, when we find a way to bring back the confidence intervals, also @wiesenfa's latest solution can be used.
Jun 7 2022
Hey everyone,
I added tests in current future branch for checking class of “algorithm” column in challenge object.
Jun 3 2022
First, I tried the fix with R 3.6 and can confirm that it does not break the functionality there.
May 30 2022
I have added object[[algorithm]] <- as.factor(object[[algorithm]]) to challengeR.R as you suggested. Now everything works without any problem. No need of stating stringsAsFactors anymore during CSV read.
May 23 2022
Thank you so much @aekavur ! It helps a lot to understand the reason finally!
May 22 2022
Hi again :)
May 16 2022
if the output is NULL, object[[by]] is not a factor, i.e. class(object[[by]]) is "character", in this case you need to use use unique() and probably your solution
Hi again,
May 13 2022
Thanks Emre!
Thats a weird change. I didn't find any mention in R changelog.
probably instead of
algorithms=factor(unique(object[[by]]))
it will be preferred
Congrats for tracing this down!
Finally, I could find the source of the bug. 😊 It is caused by changed output type of unique() function in R:Base from R-3 to R-4.
Feb 28 2022
I am sharing my current test code with artificial data. Since there can be 4-5 blob plots in the report (depending on data, task number), I need to prepare a new test code for only blob plots. Until that, you may use the code I am sharing.
thanks Emre. that's problematic, confidence intervals are missing. Could you share a code file for testing with artificial data (ideally not with the report as output but the plot itself)? Then I will try to look into it. or is this difficult for you?
I have tried this approach. I just needed to remove minor_breaks=NULL, line since there is no such a config in R/scale-discrete-.r
Feb 24 2022
I think the solution is to consider rank not as continuous but a factor (essentially a string)
That means first following
Feb 21 2022
THanks Emre! This sounds like a lot of effort. Please give me some time to have a look at it
I have tried many configurations just to force ggplot2 to start y-axis labels from "1" when choosing automatic scaling. However, it was not possible :/
Feb 14 2022
I guess overall it's a matter of taste.
Fully automatic one has several problems: in case of the 30 algorithms, scale starts with 0 which is not sensible. I'm not sure what happens with something like 27 or 17 algorithms (a number which doesn't divide by 5). in case of the 7 alogirhtms it starts with 2 which I find a bit weird, I would expect a scale starting with 1. Thus, I would at least include the limits=c(1,max(...)) argument which however as said before may lead to sequences like 1,7,13,... but maybe this is not so much of a problem.
Let's try the automatic config of ggplot :)
If I remember correctly this didn't work layout-wise for large number of algorithms. Numbers will either overlap or need to get very small/size of figure will need to be increased.
try to test with something like 20 algorithms, how does the report look then?
what's the problem with 1,5,10,15,18? the scale isn't affected, so for me it wouldn't matter that it's not the same intervals. in principle you could also omit the 18, i.e. only 1,5,10,15. Instead of all integers, I would rather use the automatic choice.
I agree with you. On the other hand, putting breaks according to a defined integer can be tricky. For example, let's assume that we have decided to define breaks on every 5th element. The y-axis will be a 1,5,10,15,18 for a challenge with 18 algorithms. The last portion of the sequence will have a different period. Therefore, I offer including all integer breaks for the [1, #algorithms] range. I am putting some examples here:
Feb 11 2022
not sure whether this is a good idea. imagine a challenge with 18 algorithms. there will be only a 1 and an 18 and nothing in between, this may make it difficult to read. what do you think?
I have tried suggested codes but they did not fix the problem. Besides, there caused additional issues. :)