Page MenuHomePhabricator

Cannot use special characters (umlauts, accents) in Data Manager's tree view
Closed, ResolvedPublic

Description

Since an update to MITK 2016.11 I cannot use special characters such as 'ä' or 'ß' when editing the name of a DataNode in Data Manager's tree view (via a double-click on the item or by pressing F2 when an item is selected). The in-place editor displays the correct characters, but they are not correctly transfered to the real node name (std::string). When the editor is closed, the characters are replaced by "missing character" symbols.

The reason seems to be a conversion of std::string to QString via QFile::encodeName() which is uses at least since https://phabricator.mitk.org/rMITK5d6355234de9bfddd61b8e9d7aa1159c53a69e99
When I replace QFile::encodeName() by the more current QString::fromStdString() everything works fine for me.

Unless there is a good reason why QFile::encodeName() needs to be used, I suggest replacing it by QString::fromStdString(). See pull request https://github.com/MITK/MITK/pull/179

Event Timeline

kislinsk triaged this task as Normal priority.Mar 7 2017, 10:15 AM
kislinsk added a subscriber: kislinsk.

Thanks, @maleike. 👍

@goch As you put some effort in these things as far as I know, is there something against the PR?

I just fixed the original description, the encodeName() originates from a commit by @zelzer, not @floca (the commit I originally cited just reused the pattern in multiple other lines)

Wait a little, there's more to this, I'll probably add to the first commit!

I do not see an issue with it. EncodeName is really the wrong function to use at this point, as it was designed for encoding file names into platform/locale specific 8 bit.

However as modern systems tend to support unicode filenames as well we might even want to consider changing the behaviour in QmitkIOUtil

In T22602#96049, @goch wrote:

However as modern systems tend to support unicode filenames as well we might even want to consider changing the behaviour in QmitkIOUtil

Exactly :-( This is what I am currently trying locally. However, when I fix QmitKIOUtil.cpp I run into problems with itksys::SystemTools::FileExists() which decides that "C:/bröd.nrrd" does not exist (though it does!) when the file name string is UTF-8 encoded (as is the case when we convert QString to std::string via .toStdString()). I suspect that there are more of those kind of incompatibilities around..

While I will still investigate a little on why itksys::SystemTools does not like UTF-8 and whether we can change that, the question for the MITK core team is: what encoding _shall_ be used for character strings inside MITK? With (some / a lot of) effort we could normalize all conversions to other string types (like Qt or other libraries), but we absolutely need to be consistent here.

@goch, @kislinsk: have you already had a discussion on this topic? Are there any guidelines?

@maleike I think the current policy could be best described as "free for all"
However I am fully in favor of using a consistent conversion and policy and am open for suggestions. UTF 8 as internal string encoding seems reasonable to me and should harmonize reasonably well with Qt.
The most trouble would probably be the interplay with other, especially file system related, toolkits.
itksys is a problem you already encountered. I believe boost filesystem is currently not used anywhere, Poco might be another issue. I believe tinyxml supports UTF8 as well.

It would probably be a good idea to write a utf8 test which tries to read/write sample files with umlauts/russian/chinese characters for each registered mimetype. However we currently do not have a way to create a sample file for each mime type, even those registered by a third party.

@maleike Did you try to use QString::toLatin1() instead of QString::toStdString()? AFAIK this is used rather often in MITK.

I spent quite some time now on understanding what is happening. Attached to this task are my detailed notes (see two comments further down).
The overall situation is complicated, so I need to meditate on it for some time before any further changes. Discussions are welcome.

The executive summary is: we have two incompatible assumptions on the encoding of DataNode's name in our code (locale-dependent vs. utf-8). Mostly it still works fine. Loading ö.nrrd works but Data Manager displays it as ? when the system's locale is Latin1 (for example).

I think my pull request does not worsen the current state but slightly improves on name editing in Data Manager. So it could be integrated regardless of further encoding cleanups.

@maleike Regarding attachments, simply drag & drop them into the comment text box. 💡

Ohhhh, now I also discovered the cloud symbol :-) I havn't yet made the connection between "cloud" and "storage", still attached to disks obviously :-)

@maleike What's the status. Should we merge the PR for now or wait for more fixes?

I think my pull request does not worsen the current state but slightly improves on name editing in Data Manager. So it could be integrated regardless of further encoding cleanups.

Unchanged status, above comment is still valid.

If somebody on your side wants to dive deeper into this subject I remain at your disposal for any discussions.

kislinsk claimed this task.

Okay, thank you! We keep that in mind.