Page MenuHomePhabricator

Migrate from TinyXML to TinyXML-2
Closed, ResolvedPublic

Description

We currently use the latest (and patched) version 2.6.2 of TinyXML from 2011. As it is abandoned for nearly a decade now, we should migrate to the successor TinyXML-2 v8.0.0.

Event Timeline

kislinsk triaged this task as Normal priority.Nov 22 2020, 1:55 AM
kislinsk created this task.

Deleted branch feature/T27985-TinyXML-2.

Deleted branch feature/T27985-TinyXML-2.

kislinsk added a project: Noteworthy.

Changes in header files: git diff -U0 ff0027ee 8e328dab *.h > output.txt

TinyXML to TinyXML-2 migration guide

WARNING: While the APIs of TinyXML and TinyXML-2 are similar enough for a straight forward migration except for a few edge cases, it usually affects critical code. If not done thoroughly, migration may result in silently writing valid XML yet corrupt file formats (for example unexpected values). Beware of the few pitfalls! ⚠️

TinyXML package dependencies of MITK modules

  mitk_create_module(
-   PACKAGE_DEPENDS PRIVATE tinyxml
+   PACKAGE_DEPENDS PRIVATE tinyxml2
  )

TinyXML header file

- #include <tinyxml.h>
+ #include <tinyxml2.h>

TinyXML forward declarations

- class TiXmlDocument;
- class TiXmlElement;
- class TiXmlNode;

+ namespace tinyxml2
+ {
+   class XMLDocument;
+   class XMLElement;
+   class XMLNode;
+ }

TinyXML class names

See the forward declarations above to get an idea of the mapping between TinyXML class names and TinyXML-2 class names.

Use auto

In most cases when traversing or iterating through an XML document, a simple type replacement with the auto keyword is enough. You don't have to care if it is an element or a node - it just works.

- TiXmlElement* elem = parentElem->FirstChildElement();
+ auto* elem = parentElem->FirstChildElement();

TinyXML enumeration values

- TIXML_SUCCESS
+ tinyxml2::XML_SUCCESS

TinyXML class instantiation and object lifetime

TinyXML-2 objects like elements or nodes are always owned by a document instance. Do not use new or delete anymore. TinyXML-2 objects that are not part of a document are usually created on the stack like tinyxml2::Printer printer;.

- TiXmlElement* elem = new TiXmlElement("MyElement");
- parentElem->LinkEndChild(elem); // now owned by TinyXML
-
- TiXmlElement* anotherElem = new TiXmlElement("MyOtherElement");
- delete anotherElem;

+ {
+   tinyxml2::XMLDocument doc;
+
+   auto* elem = doc.NewElement("MyElement");
+   parentElem->InsertEndChild(elem);
+
+   auto* anotherElem = doc.NewElement("MyOtherElement");
+ } // doc is running out of scope and deletes all associated objects

To be clear, don't do:

Incorrect 🚫
tinyxml2::XMLElement* CreateMyElement()
{
  tinyxml2::XMLDocument doc;
  auto* elem = doc.NewElement("MyElement");
  return elem;
}

Instead, do:

Correct ✔️
tinyxml2::XMLElement* CreateMyElement(const tinyxml2::XMLDocument& doc)
{
  auto* elem = doc.NewElement("MyElement");
  return elem;
}

TinyXML methods

Search and replace

Most commonly used method names are equal. The most prominent method names to replace are:

- LinkEndChild
+ InsertEndChild

- SetDoubleAttribute
+ SetAttribute
⚠️ std::string vs const char*

While TinyXML supports std::string parameters, TinyXML-2 only supports plain C-strings.

- doc.LoadFile(filename); // std::string
+ doc.LoadFile(filename.c_str());

Don't initialize a std::string with a const char* return type. This is already true for TinyXML but during migration we found several counterexamples. const char* can be a nullptr but initializing a std::string with nullptr provokes undefined behavior.

- std::string attr = elem->Attribute("MyAttribute"); // undefined behavior

+ const char* c_attr = elem->Attribute("MyAttribute");
+ if (c_attr != nullptr)
+ {
+   std::string attr = c_attr;
+   // ...
+ }
⚠️ Implicit bool conversion of XMLError return types (and other return type changes)

tinyxml2::XMLError is now returned by some methods instead of bool. It is an enumeration and its tinyxml2::XML_SUCCESS value equals zero. Casting a zero integer to bool evaluates to false, resulting in inverted logic. Typical candidates during the migration were the <Load|Save>File() and Parse() methods of TiXmlDocument/tinyxml2::XMLDocument.

- TiXmlDocument doc(filename);
- bool success = doc.LoadFile();

// or

- TiXmlDocument doc;
- bool success = doc.LoadFile(filename); // Dangerous! May still compile but logic is inverted.

+ tinyxml2::XMLDocument doc;
+ bool success = tinyxml2::XML_SUCCESS == doc.LoadFile(filename.c_str());

⚠️ Parsing XML strings and streams

You cannot stream into an XML document anymore. Instead, parse explicitly. Also notice the completely changed signature of tinyxml2::XMLDocument::Parse(). The method doesn't return the parsed contents or a nullptr anymore.

  std::string xmlString(std::istreambuf_iterator<char>(xmlStream), {}); // read everything into string

- TiXmlDocument doc;
- xmlStream >> doc;

  // or

- if (doc.Parse(xmlString.c_str())) // Dangerous! Still compiles but logic is inverted.
- {
-   // ...
- }

+ tinyxml2::XMLDocument doc;
+ if (tinyxml2::XML_SUCCESS == doc.Parse(xmlString.c_str()))
+ {
+   // ...
+ }

XML document to string/stream conversion

There were several ways to print or stream an XML document but now it boils down to:

const char* PrintXMLDocument(const tinyxml2::XMLDocument& doc)
{
  tinyxml2::XMLPrinter printer;
  doc.Print(&printer);
  return printer.CStr();
}

XML declarations

- TiXmlDocument doc;
- doc.LinkEndChild(new TiXmlDeclaration("1.0", "", ""));

+ tinyxml2::Document doc;
+ doc.InsertEndChild(doc.NewDeclaration());

⚠️ Reading and writing boolean attributes

TinyXML only supported reading boolean attributes, accepting values like 0, 1, "false", "true", "yes", and "no". Boolean values were either written implicitly or explicitly as integers or explicitly as strings ("true" or "false"). The implicit case is the dangerous one as it continues to compile with different results at runtime.

To stay compatible to current file formats, we usually want to emulate the old behavior.

- elem->SetAttribute("MyAttribute", booleanValue); // Implicitly casted to integer
+ elem->SetAttribute("MyAttribute", static_cast<int>(booleanValue)); // Important! Keep old behavior.

  elem->QueryBoolAttribute("MyAttribute", &booleanValue);

The string case mentioned above can be optimizied now as TinyXML-2 produces the same strings for boolean values.

- elem->SetAttribute("MyAttribute", booleanValue ? "true" : "false");
+ elem->SetAttribute("MyAttribute", booleanValue);

Reading and writing size_t and similar types

One some platforms, size_t cannot be unambiguously implicitly casted as second parameter of SetAttribute(). Check how the value is read in (as integer in most cases) and explicitly cast accordingly.

- elem->SetAttribute("MyAttribute", vector.size());
+ elem->SetAttribute("MyAttribute", static_cast<int>(vector.size()));
  QueryIntAttribute("MyAttribute", &value);

Public MITK API changes

Basically replace TinyXML parameter types with corresponding TinyXML-2 types. We used the opportuniy to improve const-correctness. Methods that create new instances of XML classes now have an additional document input parameter, which is used to manage the lifetime of newly created instances.

The tinyxml2.h header file is only included in implementation files. MITK header files contain forward declarations when necessary.

A few examples are:

- TiXmlElement* Serialize() override;
+ tinyxml2::XMLElement* Serialize(tinyxml2::XMLDocument& doc) override;

- BaseProperty::Pointer Deserialize(TiXmlElement* element) override;
+ BaseProperty::Pointer Deserialize(const tinyxml2::XMLElement* element) override;

- static TiXmlElement* GetLabelAsTiXmlElement(Label* label);
+ static tinyxml2::XMLElement* GetLabelAsXMLElement(tinyxml2::XMLDocument& doc, Label* label);