02-07-2015 08:17 PM - edited 02-07-2015 08:32 PM
Hello,
LabVIEW 2013 introduced some JSON conversion functions. The documentation says that these functions work with "UTF-8 JSON strings". I tested this with a simple JSON array:
["Hello", "你好"]
The hex values of this JSON string in UTF-8 is:
5B22 4865 6C6C 6F22 2C22 E4BD A0E5 A5BD 225D
The JSON functions handled "Hello" (4865 6C6C 6F) with no problems, but "你好" (E4BD A0E5 A5BD) got corrupted. On my English Windows PC:
It looks these functions don't actually work with plain UTF-8 strings as advertised. Instead, LabVIEW performs double-encoding, as described in detail in this StackOverflow page. How can I avoid this? How can Iask LabVIEW to treat my input string as already-encoded UTF-8 and not to perfom any conversions on it?
Thanks!
(FWIW, the JSON LabVIEW toolkit from LavaG.org parses my UTF-8 string correctly)
Test code
Output (String indicators set to "Hex Display")
02-07-2015 09:24 PM
02-07-2015 11:29 PM - edited 02-07-2015 11:33 PM
@mikeporter wrote:
I'm sure how you can accomplish a direct encoding because LabVIEW doesn't support Unicode.
What do you suppose the documentation is trying to say?
I have read about the challenges that NI faces in implementing Unicode support without breaking existing code, but they seem to be slowly working on it. Full support is still far away, I know, but I was under the impression that the new JSON functions are one of the Unicode-ready parts.
02-08-2015 08:57 AM
02-17-2015 08:00 AM
Are you familiar with how encodings work?
An ASCII string, a SHIFT-JIS string, and a UTF-8 string have different encodings, but they're all of the same native LabVIEW datatype. That is, all of them are native LabVIEW strings.
@mikeporter wrote:
They might be ready, but in the end they are still converting to and from native LabVIEW datatype, and LabVIEW strings don't support Unicode.
In terms of your quibbles about the documentation, anything (obviously) means anything that LabVIEW can create -- which doesn't include Unicode.
I'm guessing you're referring to the fact that LabVIEW controls/indicators only display the local encoding correctly (please correct me if I'm wrong), but that's a different issue. Controls/indicators weren't designed with UTF-8 in mind so they can't interpret UTF-8. However, (Un)Flatten to/from JSON can interpret (and produce) UTF-8.
In my original example,
I simply asked for "Unflatten to JSON" to take a native LabVIEW string (of UTF-8 encoding) and output an array of native LabVIEW strings (of UTF-8 encoding) -- these two actions are already possible in LabVIEW today.
02-17-2015 09:15 AM - edited 02-17-2015 09:17 AM
JKSH wrote:
I simply asked for "Unflatten to JSON" to take a native LabVIEW string (of UTF-8 encoding) and output an array of native LabVIEW strings (of UTF-8 encoding) -- these two actions are already possible in LabVIEW today.
Unless you enable the unreleased (and not fully functional) UTF-8 mode for LabVIEW through an unofficial LabVIEW INI file setting, LabVIEW native strings are always only encoded in whatever is the current MBCS encoding for Windows. This is NEVER UTF-8 (Windows doesn't support UTF-8 as user MBCS) but one of the standard ANSII code pages such as Windows 1253 etc. So there can't be currently a function that could convert a native LabVIEW string (of UTF-8 encoding) and do anything with it, since there doesn't exist a "native LabVIEW string (of UTF-8 encoding)". You could create a byte stream sequence that contains UTF-8 encoded characters but when you convert/cast it into a LabVIEW string LabVIEW will interpret it as a string in the current ANSII code page.
In order to support such a function, LabVIEW needs to have at least full support for UTF-8 strings and a possibility to distinguish those strings from non UTF-8 strings (basically requiring a new datatype) and that is a real beast to tackle, since every LabVIEW platform (Windows, Mac, Linux) has pretty different ideas how this should be done and implementing a full anything-to-UTF8-to-anything all in LabVIEW is not really an option either.
02-23-2015 11:14 AM
Thanks for your detailed explanations, Rolf.
Ok, we've defined things differently, so let's clarify them first. By "native LabVIEW string":
I think your conceptual distinction is a good one to make, and I think LabVIEW code would be more readable and less ambiguous LabVIEW used different datatypes to represent "text strings" vs "binary strings". But that's a topic for another day.
@rolfk wrote:
So there can't be currently a function that could convert a native LabVIEW string (of UTF-8 encoding) and do anything with it
That's ok, because at the core of it, all I've asked for was for LabVIEW to not convert my input.
Something like NI_LVConfig.lvlib:Read Key.vi. The following gives me the same output as cases #3 and #5 in my original post:
[Greetings] English="Hello" Chinese="你好"
It's ironic that the "non UTF-8 aware" INI VI preserves my UTF-8 input, while the "UTF-8 aware" JSON function mangles it. 😛
@rolfk wrote:
In order to support such a function, LabVIEW needs to have at least full support for UTF-8 strings and a possibility to distinguish those strings from non UTF-8 strings (basically requiring a new datatype) and that is a real beast to tackle
Yes, I understand that getting existing features to support UTF-8 is a herculean task, and I agree that the ideal way forward is to have "UTF-8 string" available as a separate datatype.
But, again, I wasn't asking for LabVIEW to convert from one encoding to another. I was simply asking for LabVIEW to not convert my input.
Perhaps a "preserve source encoding" flag as an additional input to (Un)Flatten to/from JSON, defaulted to "False" to maintain compatibility with current behaviour.
@rolfk wrote:
every LabVIEW platform (Windows, Mac, Linux) has pretty different ideas how this should be done and implementing a full anything-to-UTF8-to-anything all in LabVIEW is not really an option either.
This problem can be tackled using a library that has already done the legwork. ICU provides conversions to/from a large collection of codepages consistently across all 3 platforms (and more): http://site.icu-project.org/ (But again, it's not what I'm asking for: I just wanted LabVIEW to not convert my input)
02-23-2015 03:40 PM
02-23-2015 03:42 PM