Page MenuHomePhabricator

[dash] NaN values cause error
Closed, ResolvedPublic

Assigned To
None
Authored By
eisenman
Oct 15 2021, 2:44 PM
Referenced Files
F2546240: image.png
Jun 17 2022, 8:51 AM
F2502172: NaNs.R
Feb 8 2022, 2:15 PM
F2502170: image.png
Feb 8 2022, 2:15 PM
F2498740: NaNs.R
Jan 28 2022, 11:33 AM
F2498741: data_matrix_NaN.csv
Jan 28 2022, 11:33 AM
F2457675: visart_error_nan_handling.PNG
Oct 15 2021, 2:44 PM

Description

Currently, NaN values are not handled. Thus, the fallback error message is propagated to the user.

Steps to reproduce:

  • Load file containing NaN values
  • Ranking: metric-based
  • No bootstrapping
  • Click "Generate Report"

--> The following error message appears:

visart_error_nan_handling.PNG (379×575 px, 15 KB)

Event Timeline

eisenman created this task.

One solution to approach this issue is to check if NaN values are present in the data. If so, the user should be able to select the NaN handling strategy if necessary (e.g., not for case-based ranking). So what about providing the NaN handling options after the chosen ranking method is known?

Current status of the issue:

  1. aggregateThenRank and testThenRank have na.treat option. Users can select giving zeros to NaN values (na.treat = 0), remove missing data (na.treat = "na.rm") options, or they can specify another function. Therefore, missing/NaN values in the data is not an issue for these ranking options. If na.treat is defined, aggregateThenRank and testThenRank works without any problem (both normal and bootstrapped versions are OK). We have already put na.treat option to webchallengeR/VISSART (it is enabled if data has missing/NaN values).
  2. However, the problem exists for rankThenAggregate. Although there is no need of na.treat in this ranking approach, it is necessary during report generation. The reason of the error is that the ranking class generated by rankThenAggregate does not have an object for na.treat option. Then, report function cannot find it and raise error:
Error in report.bootstrap.list(object, consensus, file, title, colors,  : 
Please specify na.treat in as.challenge().

An example code and data to demonstrate the situation can be found in the attachment. When NaNs.R script is run, AggregateThenRank.pdf, AggregateThenRank_bootstrap.pdf, TestThenRank.pdf, TestThenRank_bootstrap.pdf will be successfully generated. However, report generation for TestThenRank configurations will give errors.

Thank you for investigating this! In challengeR it is covered in the way that a message is emitted saying "na.treat obligatory if report is intended to be compiled". In order to solve the mentioned issue 2, a strategy for the preferred way to handle it in VISSART should be defined. Should the user be guided to specify the NaN handling strategy? Should the user be able to generate a report but without the plots that require numeric values?

I guess na.treat it is only needed for the line plot for comparing to other ranking methods?
In this case, a message could be thrown when compiling the report saying something like "line plot comparing ranking methods omitted since na.treat is not specified. Specify na.treat in as.challenge() if inclusion of line plot is desired" and allow compilation of the report (excluding line plot).
(Note that you can define na.treat both in as.challenge() as well as in the ranking functions).

The problem is almost fixed by giving na.treat parameter in both as.challenge and ranking methods (except rankThenAggregate). Now we can generate reports for all ranking methods.

The only issue exists that, there are some warning messages in "2.2 Podium plot" and "2.3 Ranking heatmap" sections if na.treat is defined as "na.rm". It is normal to generate errors but, the messages overflows the report page since they are long as you may see below. If na.treat is defined with an integer (for example na.treat=0), there is no warning message in the report. What kind of strategy should we select here?

image.png (710×911 px, 39 KB)

Can you please test these via the code in the attachment?

Warning messages when there are missing values in the data were reviewed as below:

image.png (677×703 px, 38 KB)

Now users can go with missing data with their preferred na.treat strategy. They are warned properly when their strategy is used during function call.