LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

[JSON] How to avoid double encoding of UTF-8 strings?

Hello,

 

LabVIEW 2013 introduced some JSON conversion functions. The documentation says that these functions work with "UTF-8 JSON strings". I tested this with a simple JSON array:

 

["Hello", "你好"]

 

The hex values of this JSON string in UTF-8 is:

 

5B22 4865 6C6C 6F22 2C22 E4BD A0E5 A5BD 225D 

 

The JSON functions handled "Hello" (4865 6C6C 6F) with no problems, but "你好" (E4BD A0E5 A5BD) got corrupted. On my English Windows PC:

 

  • Unflattening "E4BD A0E5 A5BD" produces "3F3F"
  • Flattening "E4BD A0E5 A5BD" produces "C3A4 C2BD C2A0 C3A5 C2A5 C2BD"

 

It looks these functions don't actually work with plain UTF-8 strings as advertised. Instead, LabVIEW performs double-encoding, as described in detail in this StackOverflow page. How can I avoid this? How can Iask LabVIEW to treat my input string as already-encoded UTF-8 and not to perfom any conversions on it?

 

 

Thanks! 

 

(FWIW, the JSON LabVIEW toolkit from LavaG.org parses my UTF-8 string correctly)

 

Test code

UTF-8 JSON.png

 

Output (String indicators set to "Hex Display")

UTF-8 JSON Output.png

Certified LabVIEW Developer
Message 1 of 9
(8,512 Views)
I'm sure how you can accomplish a direct encoding because LabVIEW doesn't support Unicode.

Mike...

Certified Professional Instructor
Certified LabVIEW Architect
LabVIEW Champion

"... after all, He's not a tame lion..."

For help with grief and grieving.
0 Kudos
Message 2 of 9
(8,488 Views)

@mikeporter wrote:
I'm sure how you can accomplish a direct encoding because LabVIEW doesn't support Unicode.

What do you suppose the documentation is trying to say?

 

  • Flatten to JSON:
    • "Converts data you wire to the anything input to a UTF-8 JSON string."
    • "anything contains the data you want to convert to a UTF-8 JSON string."
    • "JSON string is the flattened data encoded in UTF-8. UTF-8 encoded strings may not display correctly in LabVIEW controls."
  • Unflatten from JSON:
    • "Converts a UTF-8 JSON string to the LabVIEW data type you wire to type/defaults."
    • "JSON string is the flattened UTF-8 string that you want to unflatten."

 

I have read about the challenges that NI faces in implementing Unicode support without breaking existing code, but they seem to be slowly working on it. Full support is still far away, I know, but I was under the impression that the new JSON functions are one of the Unicode-ready parts.

Certified LabVIEW Developer
0 Kudos
Message 3 of 9
(8,468 Views)
They might be ready, but in the end they are still converting to and from native LabVIEW datatype, and LabVIEW strings don't support Unicode.

In terms of your quibbles about the documentation, anything (obviously) means anything that LabVIEW can create -- which doesn't include Unicode.

Mike...

Certified Professional Instructor
Certified LabVIEW Architect
LabVIEW Champion

"... after all, He's not a tame lion..."

For help with grief and grieving.
0 Kudos
Message 4 of 9
(8,447 Views)

Are you familiar with how encodings work?

 

An ASCII string, a SHIFT-JIS string, and a UTF-8 string have different encodings, but they're all of the same native LabVIEW datatype. That is, all of them are native LabVIEW strings.

 


@mikeporter wrote:
They might be ready, but in the end they are still converting to and from native LabVIEW datatype, and LabVIEW strings don't support Unicode.

In terms of your quibbles about the documentation, anything (obviously) means anything that LabVIEW can create -- which doesn't include Unicode.

I'm guessing you're referring to the fact that LabVIEW controls/indicators only display the local encoding correctly (please correct me if I'm wrong), but that's a different issue. Controls/indicators weren't designed with UTF-8 in mind so they can't interpret UTF-8. However, (Un)Flatten to/from JSON can interpret (and produce) UTF-8.

 

In my original example,

  • "Flatten to JSON" takes an array of native LabVIEW strings (of a locale-specific encoding) and outputs a native LabVIEW string (of UTF-8 encoding).
    • (So yes, LabVIEW can create Unicode strings)
  • "Unflatten to JSON" takes a native LabVIEW string (of UTF-8 encoding) and outputs an array of native LabVIEW string (of a locale-specific encoding).

 

I simply asked for "Unflatten to JSON" to take a native LabVIEW string (of UTF-8 encoding) and output an array of native LabVIEW strings (of UTF-8 encoding) -- these two actions are already possible in LabVIEW today.

Certified LabVIEW Developer
0 Kudos
Message 5 of 9
(8,354 Views)

JKSH wrote:

 

I simply asked for "Unflatten to JSON" to take a native LabVIEW string (of UTF-8 encoding) and output an array of native LabVIEW strings (of UTF-8 encoding) -- these two actions are already possible in LabVIEW today.


Unless you enable the unreleased (and not fully functional) UTF-8 mode for LabVIEW through an unofficial LabVIEW INI file setting, LabVIEW native strings are always only encoded in whatever is the current MBCS encoding for Windows. This is NEVER UTF-8 (Windows doesn't support UTF-8 as user MBCS) but one of the standard ANSII code pages such as Windows 1253 etc. So there can't be currently a function that could convert a native LabVIEW string (of UTF-8 encoding) and do anything with it, since there doesn't exist a "native LabVIEW string (of UTF-8 encoding)". You could create a byte stream sequence that contains UTF-8 encoded characters but when you convert/cast it into a LabVIEW string LabVIEW will interpret it as a string in the current ANSII code page.

In order to support such a function, LabVIEW needs to have at least full support for UTF-8 strings and a possibility to distinguish those strings from non UTF-8 strings (basically requiring a new datatype) and that is a real beast to tackle, since every LabVIEW platform (Windows, Mac, Linux) has pretty different ideas how this should be done and implementing a full anything-to-UTF8-to-anything all in LabVIEW is not really an option either.

Rolf Kalbermatter
My Blog
Message 6 of 9
(8,340 Views)

Thanks for your detailed explanations, Rolf.

 

Ok, we've defined things differently, so let's clarify them first. By "native LabVIEW string":

  • I meant "the pink wire", which includes "byte streams"/"binary strings"
  • You meant "sequence of bytes which represent human-readable text, encoded in a LabVIEW-supported codepage", which excludes "byte streams"/"binary strings" (see below)

String Philosophy.png

 

I think your conceptual distinction is a good one to make, and I think LabVIEW code would be more readable and less ambiguous LabVIEW used different datatypes to represent "text strings" vs "binary strings". But that's a topic for another day.

 


@rolfk wrote:

So there can't be currently a function that could convert a native LabVIEW string (of UTF-8 encoding) and do anything with it


That's ok, because at the core of it, all I've asked for was for LabVIEW to not convert my input.

 

Something like NI_LVConfig.lvlib:Read Key.vi. The following gives me the same output as cases #3 and #5 in my original post:

 INI Greetings.png

[Greetings]
English="Hello"
Chinese="你好"

It's ironic that the "non UTF-8 aware" INI VI preserves my UTF-8 input, while the "UTF-8 aware" JSON function mangles it. 😛

 


@rolfk wrote:

In order to support such a function, LabVIEW needs to have at least full support for UTF-8 strings and a possibility to distinguish those strings from non UTF-8 strings (basically requiring a new datatype) and that is a real beast to tackle


Yes, I understand that getting existing features to support UTF-8 is a herculean task, and I agree that the ideal way forward is to have "UTF-8 string" available as a separate datatype.

 

But, again, I wasn't asking for LabVIEW to convert from one encoding to another. I was simply asking for LabVIEW to not convert my input.

 

Perhaps a "preserve source encoding" flag as an additional input to (Un)Flatten to/from JSON, defaulted to "False" to maintain compatibility with current behaviour.

 


@rolfk wrote:

every LabVIEW platform (Windows, Mac, Linux) has pretty different ideas how this should be done and implementing a full anything-to-UTF8-to-anything all in LabVIEW is not really an option either.


This problem can be tackled using a library that has already done the legwork. ICU provides conversions to/from a large collection of codepages consistently across all 3 platforms (and more): http://site.icu-project.org/ (But again, it's not what I'm asking for: I just wanted LabVIEW to not convert my input)

Certified LabVIEW Developer
0 Kudos
Message 7 of 9
(8,268 Views)
I understand your frustration but you still ignore the main point. A LabVIEW string is currently considered to be always in whatever MBCS encoding that is the current system encoding. Many functions also don't care and really treat it as binary byte stream (VISA, TCP/UDP, file I/O) but that is a legacy that the LabVIEW developers would wish they could get rid of.

The only option for now besides throwing away everything and start with LabVIEW 3000 and a completely clean slate would be indeed an option to any function that can handle Utf8 to have some options input to not reencode the iincuming string.

As to using ICU that is nice for desktops but at least problematic for embedded system including cRIO and myRIO.
Rolf Kalbermatter
My Blog
0 Kudos
Message 8 of 9
(8,252 Views)
I understand your frustration but you still ignore the main point. A LabVIEW string is currently considered to be always in whatever MBCS encoding that is the current system encoding. Many functions also don't care and really treat it as binary byte stream (VISA, TCP/UDP, file I/O) but that is a legacy that the LabVIEW developers would wish they could get rid of.

The only option for now besides throwing away everything and start with LabVIEW 3000 and a completely clean slate would be indeed an option to any function that can handle UTF8 to have some options input to not reencode the incoming string.

As to using ICU, that is nice for desktops but at least problematic for embedded system including cRIO and myRIO.
Rolf Kalbermatter
My Blog
0 Kudos
Message 9 of 9
(8,247 Views)