Question about the default encoding of characters written by the keyboard (Windows-1252 (CP1252) / ISO-8859-1(Latin-1) / UTF-16LE)

Kaleck-FR · ‎01-19-2024

Hello everyone,

I come here because I have an misunderstanding in the character encoding system used in commands, indicators and constants of type "string".

It seemed to me at first that LabVIEW handled the Latin-1 (or ISO-8859-1) character set by default.

It seems rather that for Europeans, it is rather the Windows-1252 character set (or CP1252, sometimes called incorrectly ANSI), because Windows is installed with this encoding for European accents (and not all).

This is easily verified by transforming a series of bytes 0-255 into a string. We get the array of the Windows-1252 game perfectly.

I almost could have done with it when I noticed that the "€" sign was not encoded according to Windows-1252 when typing in a string command by the keyboard.

This sees the hex code 0xAC20 (on 2 bytes therefore), which corresponds to UTF-16LE encoding!!

LabVIEW should have coded it in 0x80 of the Windows-1252 character set, but no... 🤔

Why when typing does it use UTF-16LE set instead of Windows-1252 set?

Moreover, if you play a command with the € symbol inserted via the keyboard, switch the channel to hex display then return to normal, it is no longer displayed, but you gets 2 characters from Windows-1252 game...

I know it's possible to tell LabVIEW to work with unicode via the UseUnicode=TRUE command in the LabVIEW.ini file, but I haven't done that so it technically shouldn't insert the € sign on two bytes... Any idea?

Otherwise, why am I bothering with this? Because I'm working on QR codes whose basic encoding is ISO-8859-1, and if that's not possible, I'll have to encode it in UTF but that takes up more space, so less space for the same QR code size.

But if labview basically encodes certain characters on 2 bytes while the same ones exist on just one... it's a little stupid!

I created a little vi to test all this:

Here the "€" symbol, alt+0128 (0x80) return the code "0xAC20" for UTF16-LE encoding.

The "™" symbol, alt+0153 (0x99) return the code "0x2221" for UTF16-LE encoding.

sletrab · ‎02-16-2024

Hello,
Can not open your VI because it is 2023 and I still use 2021.
As you can see in the attached Pic you can write the Euro Sign in two kinds. One consists of 2 Byte one of one Byte. I was struggeling with this before. I habe a attached a VI with the "good" € sign.
Use UTF-8 instead of UTF-16 to save space?

rolfk · ‎02-16-2024

LabVIEW uses for string handling the platform 8 bit APIs. This means that under Windows it uses whatever codepage is associated with your current locale. This is for pretty much all Western countries a single byte encoding, often called ANSI in the Windows world but that name is not quite correct. All these codepages have the first 128 characters in common but fill in different characters in the upper half of the code points depending on the region.

For some Asian regions codepages can contain multibyte characters, meaning that a single character can consist of more than one byte. As long as you don’t send strings from a computer using one codepage to one using a different one, things are mostly peachy for users.

UTF-16LE is the encoding used internally in Windows but when LabVIEW was ported to Windows that did not yet exist. LabVIEW interfaced to the old Windows APIs that were codepage based and Microsoft gradually replaced the internals to use UTF-16 “almost” everywhere but provided a compatibility layer to let the codepage based APIs work. Unfortunately is the assumption in LaBVIEW that a character string is equal to a byte array so deeply engrained in everything that it proofed very difficult to disassociate the two later without breaking ten thousands of existing LabVIEW applications when they would get upgraded to such a new version. So that project lingered and was never finished, not at least because the existing situation worked in fact to well for most users.

On non-Windows platforms LabVIEW uses whatever encoding the underlying C runtime is using. This is also user configurable but for Linux and Mac has been UTF-8 by default for many years already.

The setting in Windows to make the codepage use UTF-8 too exists since at least Windows 7 but was and still is marked as Beta test feature only.

But are you letting the barcode scanner insert characters into the LabVIEW string through a keyboard wedge (likely in software as it is probably an USB scanner installing itself as virtual keyboard). Or how do you get the characters from the scanner into the LabVIEW string? It would seems that somewhere between the scanner and the LabVIEW string someone is incorrectly translating the string to UTF-16 l, not realizing that the target application is not UTF16.

Rolf Kalbermatter
My Blog

sletrab · ‎02-16-2024

Hallo rolfk,

" least because the existing situation worked in fact to well for most users."
That does noch fit for me.
I am from Germany. In Germany there are special characters linke ä,ö,ü,ß. So if you want to paste text from the internet oder other application (for example SQL-Code) in the Blockdiagramm oder in Controls, it turns into UTF16 if one of the characters is part of that text. If you have bad luck, you get chinese characters. It works if you delete these special characters before you paste it to LabVIEW.

rolfk · ‎02-16-2024

LabVIEW invokes the ANSI Windows APIs to retrieve text from the clipboard. If it does not get ANSI (8-bit current codepage encoded text) something is going wrong and it’s not sure the fault is with LabVIEW.

If your internet originated text contains characters that can’t be displayed in your current codepage, then default character substitution should happen, meaning the Windows WideCharToMultiByte() function that is called byWindows to convert possible Unicode text to 8-bit encoded text will replace those characters with a question-mark. If you see Unicode code-points instead in LabVIEW then I can only think of two possibilities:

You have Unicode=True in your LabVIEW.ini file or the application pasting that text to the clipboard is trying to outsmart Windows by doing something non standard with the clipboard.

Rolf Kalbermatter
My Blog

sletrab · ‎02-19-2024

Hello rolfk,
Yes, I have Unicode=True in my LabVIEW.ini file and in the ini-file of the application.
I have just disabled it and will try if it fits better for me.

Thank you

sletrab

Kaleck-FR · ‎02-20-2024

Hello everyone,

Rolfk, I don't think Labview uses ANSI windows APIs when pasting the clipboard because it is possible to paste non-ANSI characters (without having activated the unicode option in the ini file).

On the contrary, and this is the problem, it seems that Labview uses the Windows clipboard in UTF16-LE for the clipboard (since version 2022) and that subsequently, for the processing of strings, it uses ANSI libraries.

The same goes for direct keyboard entry, special signs such as € which are part of the 8-bit codepage are no longer considered as is and are automatically converted to UTF16-LE, which labview can no longer manage afterwards...

It seems that the problem happened with Labview 2022. Labview 2021 still works fine on the same pc.

LabVIEW

Question about the default encoding of characters written by the keyboard (Windows-1252 (CP1252) / ISO-8859-1(Latin-1) / UTF-16LE)

Question about the default encoding of characters written by the keyboard (Windows-1252 (CP1252) / ISO-8859-1(Latin-1) / UTF-16LE)

Re: Question about the default encoding of characters written by the keyboard (Windows-1252 (CP1252) / ISO-8859-1(Latin-1) / UTF-16LE)

Re: Question about the default encoding of characters written by the keyboard (Windows-1252 (CP1252) / ISO-8859-1(Latin-1) / UTF-16LE)

Re: Question about the default encoding of characters written by the keyboard (Windows-1252 (CP1252) / ISO-8859-1(Latin-1) / UTF-16LE)

Re: Question about the default encoding of characters written by the keyboard (Windows-1252 (CP1252) / ISO-8859-1(Latin-1) / UTF-16LE)

Re: Question about the default encoding of characters written by the keyboard (Windows-1252 (CP1252) / ISO-8859-1(Latin-1) / UTF-16LE)

Re: Question about the default encoding of characters written by the keyboard (Windows-1252 (CP1252) / ISO-8859-1(Latin-1) / UTF-16LE)