Page MenuHomePhabricator

Serialization of NAN values within mitk::DoubleVectorProperty is inconsistent
Closed, ResolvedPublic

Description

When serializing mitk::DoubleVectorProperty with NaN values the content of the related line in xml-output looks as follows:

<Value idx="74" value="1.#QNAN" />

When such values a read back into memory the serialized string "1.#QNAN" is interpreted as 1.0 which is not correct as it should be set to NAN again.

When addressing this back the behavior for other special values like INF and -INF (inifinity) should be checked as well. Further this misbehavior may also apply to
mitk::FloatProperty and mitk::DoubleProperty and should be corrected there as well.

Event Timeline

Stefan mentioned on the mailing list that this was a good candidate for bug squashing. In fact this might produce quite some detail work. Since I looked into it, I'd like to share what I found and provide at least the basis to improve on this.

String conversion of special float numbers (inf / nan) is broken in Microsoft's C/C++ library, no difference for printf or ostream [1,2]. TinyXML and our code for DoubleVectorProperty uses ostream for conversion.

Since one of the StackOverflow discussions cited mentioned that Qt's text streams handle those values correctly, I had a look into how Qt and Poco handle those numbers. Both handle conversion themselves and have produced quite some code that does all the numeric details. Both also do specially check for and treat infinity and not-a-number. Both generate "inf" or "nan" (Qt hard coded, Poco configurable) for those special values.

A word on the option of fixing TinyXML: I think this is not a viable idea for multiple reasons:

  • when evolving the Serialization again, XML output might be done using a different library that is better tested / has a larger user base
  • float to string conversion is not something we want to implement ourselves, it is much too complex. TinyXML is self-contained however, so there are no libraries that we could use

My proposal for a fix is to add MITK functions FloatToString and similar that will just hand over to Poco's conversion in a first version. This could be changed to Qt or another library when required. Those conversion functions should then be used everywhere where our serialization wants to store float/double values via TinyXML. The XML library shall then only handle strings.

[1] http://stackoverflow.com/questions/7477978/nan-ascii-i-o-with-visual-c
[2] http://stackoverflow.com/questions/27149246/how-to-implement-serialization-and-de-serialization-of-a-double?lq=1

I implemented conversion functions as suggested. Unit tests confirm that they work well on Linux and Windows. This came to a little surprise as after my previous comment I found the following still open pull request for Poco: https://github.com/pocoproject/poco/issues/286

Either I was lucky with the specific methods I used, or the pull request is actually solved. Conversion works fine for me, for float, double, regular, and a number of special values.

Adapting the VectorProperty has already been done to test if everything works. Adapting all the other serializers (around 12) will be straight forward but repetitive. Only error handling needs to be considered. So far I returned NaN for strings that don't parse correctly. We need to see if this fits well with the existing serializers.

(also CC Keno, since he was active in serialization)

User maleike has pushed new remote branch:

bug-19492-inf-nan-serialization

I have changed all the serializers in MitkSceneSerialization(Base) and that seems to work fine. Changes were straight-forwards and tests are still passing.

Now there is something that I need to get your opinion on: TinyXML is also used in Core, for PointSetReaderService, Geometry3DToXML, and ProportionalTimeGeometryToXML. All of them andle double values as well and could be confronted with NaN or infinity values (which would probably mean some kind of error states). They would also not be able to reproduce those values after convertion to XML and back.

To apply the solution that I proposed for SceneSerialization, we would need to move the conversion functions to core and to add PocoFoundation as a private dependency to Core.

What is your opinion on moving FloatToString conversions to core?

Outside of Core, there are calls to the problematic TinyXML function in 10 other modules, 15 files total, mostly readers/writers judging from the names.

Also, I'd like to get an outside view opinion on how to name the conversion functions. The current functions are in a header mitkFloatToString.h and are called like mitk::FloatToString(), mitk::DoubleToString(), mitk::StringToDouble() etc. There are also array versions of conversion for use with Point3D and similar, called mitk::StringsToDoubles().

We have to discuss your suggestion in the next MITK meeting. As far as I know, we tried to keep or make the Core independent of Poco. We also added a CMake option for disabling Poco, which is used by Mint for example.

(In reply to Stefan Kislinskiy from comment #5)

We have to discuss your suggestion in the next MITK meeting. As far as I
know, we tried to keep or make the Core independent of Poco. We also added a
CMake option for disabling Poco, which is used by Mint for example.

Thank you. I already guessed that Poco was not too welcome. The relevant question for your discussion is: if we want to exclude Poco and Qt in Core, how can we provide correct double to string conversion without implementing it ourselves (and building up the necessary knowledge)?

I'll be looking for alternatives..

Seems to be an old and sad topic:

http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2006/
http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2006/n2022.pdf

An interesting discussion on StackOverflow[1] mentioned created some hope on std::strtod again, but without success.

From the same discussion, I came to trying Boost's lexical_cast [2] which would work fine for us. Regarding the required functions, this dependency would be header-only, but still the preference of Qt, Poco, or Boost needs to be discussed. Personally I would tend to Boost (nice license, no additional shared libraries).

[1] http://stackoverflow.com/questions/11420263/is-it-possible-to-read-infinity-or-nan-values-using-input-streams
[2] http://www.boost.org/doc/libs/1_55_0/doc/html/boost_lexical_cast/examples.html

Ralf also mentioned Boost lexical_cast and that would be my favorite as well.

Since my last comment I enhanced the testing to cover some more cases. I also found another good article on the topic (http://www.gotw.ca/publications/mill19.htm) and I added a precision parameter to to number-to-string conversion as this is used in some of the serialization classes.

The current branch is sucessfully passing on my Windows and Linux machines. Only thing to worry about are extreme values (numeric_limits<>::min/max) which still provide surprises (see test cases).

User maleike has pushed new remote branch:

bug-19492-inf-nan-serialization-core

Good news. We talked about a Boost header-only dependency for the core and concluded, that we should do it, but start with a minimal set like only being dependent on the lexical_cast part of the Boost library - not on everything.

(In reply to Stefan Kislinskiy from comment #12)

Good news. We talked about a Boost header-only dependency for the core and
concluded, that we should do it, but start with a minimal set like only
being dependent on the lexical_cast part of the Boost library - not on
everything.

Nice, thanks! Do you have somebody who could have a look at the changes in bug-19492-inf-nan-serialization-core and tell me how to modify the CMakeLists.txt of Core to achieve this? Otherwise I'll have a look into whether boost support a components syntax as VTK or Poco.

Also: is there a preferred way / preferred example on how to force the "mitk use boost" option on the superbuild level?

I just had a look and there is no such thing as components for Boost regarding CMake's FindBoost script, as each Boost library has the same top-level include directory. Even the binary libraries are installed to the same lib directory.

Hence, IMHO adding the Boost package dependency to the Core is okay. However, it's not just adding Boost to the CMakeLists.txt of the Core here as we also have to remove the Boost option from our build script and so on.

Today we have some guests at our department and therefore the bug squashing party will be shorter than usual but let me try to accomplish the hard boost dependency today. :)

(In reply to Stefan Kislinskiy from comment #14)

Today we have some guests at our department and therefore the bug squashing
party will be shorter than usual but let me try to accomplish the hard boost
dependency today. :)

Thanks for this, I just saw that you finished integrating the dependency. Did you also already decide on this bug here? Could we integrate my solution?

Is it really necessary to still provide a wrapper function for boost::lexical_cast? At least in one direction it is one line and for the other direction I am not sure if we really need the extra case handling as long as we can deserialize what we serialized?

If we would provide a wrapper function, we should provide it at least for mitk::ScalarType (which is double), double itself, and float. Template? In that case we would prefer another naming for the function like ScalarToString to be more general.

[1] also shows a nice way to support negative infinity.

But again, thinking practical, isn't using boost::lexical_cast directly not sufficient? :-)

[1] http://stackoverflow.com/questions/20016600/negative-infinity

(In reply to Stefan Kislinskiy from comment #16)

Is it really necessary to still provide a wrapper function for
boost::lexical_cast? At least in one direction it is one line and for the
other direction I am not sure if we really need the extra case handling as
long as we can deserialize what we serialized?

I agree, now that we made Boost a fix part of core, we could lexical_cast directly. This would oblige every calling function to think about exception handling itself. This is a good thing.

I understand that you would still accept the wrapper for the direction "value to string"?

If we would provide a wrapper function, we should provide it at least for
mitk::ScalarType (which is double), double itself, and float. Template? In
that case we would prefer another naming for the function like
ScalarToString to be more general.

If you are fine with a header-based implementation, we can go for templates. That is fine and simplifies some things. I mainly chose the other option in order to hide the used library. I did not look into your changes of T19503 yet. Do you confirm that Boost is now a "public" dependency? I.e. can projects that include MITK via MITKConfig.cmake also include Boost headers?

[1] also shows a nice way to support negative infinity.

I'll look into that when rewriting my code.

In summary: I will

  • rename FloatToString.h to ScalarToString
  • remove the string to scalar functions, replace calls to them by boost::lecical_cast
  • move the code to core
  • check your link to improve our negative infinity handling

We would prefer the direct call to boost::lexical_cast for both directions, except there is a good reason for not doing so? For example, we think that handling more string variants of NaNs and INFs than the ones Boost is already using wouldn't be enough to justify extra conversion functions. So we would prefer the practical approach of using Boost directly.

Yes, Boost is a public dependency now and it should work with external projects (we tested it with MITK-ProjectTemplate). :)

(In reply to Stefan Kislinskiy from comment #18)

We would prefer the direct call to boost::lexical_cast for both directions,
except there is a good reason for not doing so?

Ok, I checked this again. Thanks for insisting on this, because to my surprise this works. I thought I had seen precision problems using only lexical_cast, but it works fine for all test cases on Windows/Linux. So I'll adapt our serializers and geometry-to-xml classes to use boost directly.

User maleike has pushed new remote branch:

bug-19492-inf-nan-serialization-all-boost

(In reply to Git Admin from comment #20)

User maleike has pushed new remote branch:

bug-19492-inf-nan-serialization-all-boost

This one does all conversions via boost::lecical_cast. Needs update from master regarding Boost dependency.

Otherwise the branch is just waiting for Needs_Core_Modification+

[107872]: Merge branch 'bug-19492-inf-nan-serialization-all-boost'

Merged commits:

2016-02-03 14:27:35 Daniel Maleike [2e055f]
Use boost::lexical_cast for serialization of float and double