Mixed JSON and UNICODE trouble

Universaldilletant · ‎08-05-2022

Hi Community,

I have taken a look around in the existing topics, but didn't find an appropriate solution, so here goes my problem:

With a REST request I get back a json payload, that contains characters in German, e.g. "ä", "ü", "ö", when I pretty print it in Postman or Insomnia I get:

Looking at the raw data, that turns out to be:

"parentCustomerNameASC":"Bl\u00fctenhaus GmbH"

So, in short: is there a way to convert the json unicode coding of "\u00FC" to "ü", see link here .

As far as I can see this is a mixed problem of json and encoding. The only viable solution I found so far is to go through such an string manually, like suggested here.

I am using the JKI rest client and JDP's JSONtext libraries to obtain the initial string.

If anyone can suggest any (elegant) solution in LabVIEW/Python/.NET, that would be much appreciated.

Cheers,

Niko

drjdpowell · ‎08-05-2022

My apologies. Seems being English I have neglected to implement the full unicode for the \uXXXX format. Just created an issue: https://bitbucket.org/drjdpowell/jsontext/issues/111/implement-full-unicode-in-uxxxx-format

MilanR · ‎08-05-2022

The built-in Unflatten from JSON nodes should support this case

Milan

drjdpowell · ‎08-05-2022

Try this. It is same as the Tool Network version, plus this one fix.

rolfk · ‎08-05-2022

@MilanR wrote:

The built-in Unflatten from JSON nodes should support this case

If your Windows local is set to German, or more precisely Western Europe codepage 1252, or Turkish codepage 1254. It wouldn't work with most other codepages.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

Universaldilletant · ‎08-08-2022

There is really no need to apologize!
You provided an awesome toolkit, it is definitely not your responsibility to care for all the annoying characters in foreign languages 🙂

I installed your update and I am not sure where I make the mistake, but I am not getting the correct output.

The string that I am using as input is the following:

[{"@odata.etag":"W/\"Jn\"","status":"Released","no":"101010","description":"ECO","startingDate":"2021-03-31","startingTime":"08:00:00","endingDate":"2021-03-31","endingTime":"23:00:00","dueDate":"2021-04-01","quantity":2,"m365SalesOrderNo":"1596","parentCustomerNameASC":"Blütenhaus GmbH","m365SalesOrderLineNo":10000,"salesOrderNoASC":"","customerNameASC":"","salesShipmentPositionASC":0,"itemNoASC":"","priorityCodeASC":"","startMachineCenterASC":"","productionAreaASC":""}]

I attached the code that I use:

But the result still seems to be the same:

Thank you for your help in advance!

Cheers,

Niko

drjdpowell · ‎08-08-2022

I can see in your screenshot that your JSON String is "Bl\00fctenhaus GmbH", so that is what that function returns. Perhaps you want to convert that to the ordinary sting 'Blütenhaus GmbH'? In that case you need to use a function that converts from JSON format to regular LabVIEW types, like "From JSON". It can get confusing understanding the difference between a JSON string and an ordinary string.

Universaldilletant · ‎08-08-2022

I don't know what happened in the post above, but the input string would contain:

"parentCustomerNameASC":"Bl\u00fctenhaus GmbH"

But if I get you correct, this is nor an an encoding problem, but actually a json string. This simply means I was looking at it from the wrong direction thinking it was a unicode/LabVIEW problem. But actually it is simply a json representation of an "ü"?

Do I understand you correctly that I simply implement a conversion function where this might occur?

If so, this is really embarrassing 🙈
Thanks a lot anyway for helping me simply understand the issue.

Yamaeda · ‎08-08-2022

@Universaldilletant wrote:

I don't know what happened in the post above, but the input string would contain:
"parentCustomerNameASC":"Bl\u00fctenhaus GmbH"
But if I get you correct, this is nor an an encoding problem, but actually a json string. This simply means I was looking at it from the wrong direction thinking it was a unicode/LabVIEW problem. But actually it is simply a json representation of an "ü"?

Yes, non standard ASCII character are written as their Unicode number, thus \u and the hex value.

G# - Award winning reference based OOP for LV, for free! - Qestit VIPM GitHub

Qestit Systems

drjdpowell · ‎08-08-2022

Encodings are confusing, especially when using a variety of software tools that vary in how much they support different encodings (and some of which may "helpfully" convert encoding silently). It is strange, for example, that your REST request returns '\u00fc' for that character, when it can more easily be represented in UTF-8 as the two-byte character 0xC3 0xBC. Normally, I would only expect the \uXXXX format to be used for control characters or \u0000, which is why i never implemented them fully before.

LabVIEW

Mixed JSON and UNICODE trouble

Mixed JSON and UNICODE trouble

Re: Mixed JSON and UNICODE trouble

Re: Mixed JSON and UNICODE trouble

Re: Mixed JSON and UNICODE trouble

Re: Mixed JSON and UNICODE trouble

Re: Mixed JSON and UNICODE trouble

Re: Mixed JSON and UNICODE trouble

Re: Mixed JSON and UNICODE trouble

Re: Mixed JSON and UNICODE trouble

Re: Mixed JSON and UNICODE trouble