How to determine string is ASCII or Unicode?

paul_cardinale · ‎01-31-2017

Not really. Unicode just defines the codepoints. The presence of a Byte Order Marker (BOM), signifies a Unicode Transformation Format (UTF). UTFs are methods of implementing Unicode encoding (i.e. they are wrappers for Unicode, not part of Unicode).

"If you weren't supposed to push it, it wouldn't be a button."

harry0725 · ‎02-01-2017

I agree with Bill said, there are just array of bytes as Unicode in LabVIEW.

That's why I cannot determine first two bytes to recognize it.

As Paul said, UTF-16 LE is used to present Unicode as even bytes are nulls.

Basically, I can check even bytes and string length to find it's Unicode or not.

However, the biggest problem is that my application including multi-language exchange will perform in English, Traditional Chinese, even Japanese.

In pure English, even bytes method may be working.

But, in Chinese or Japanese, every byte has value for Unicode strings.

Each Chinese character will include two bytes.

Since now, I got one additional problem for how to determine that Unicode string is English or Chinese.

If there's way to define, I can use eve bytes method to figure out English parts, and maybe use first two bytes method for Chinese.

billko · ‎02-01-2017

This whole thing sounds a bit backwards to me. I think the language issue should be settled at install time. You shouldn't be trying to determine it on the fly.

Bill

(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.

harry0725 · ‎02-02-2017

Maybe I make some confuse.

I'm writing a project which will include multi-language function for user to switch.

And I use Unicode support in my LabVIEW development environment.

So I cannot decide which language user is going to use.

In other words, my program should be working for each kind of language.

Of course I can make some additional terminal for my Unicode VIs to define data type and language format.

But it will take lots of unnecessary case structures, make coding seems like hard code.

I prefer not to use this kind of solution.

That's why I'm trying to determine Unicode automatically.

billko · ‎02-02-2017

You could make identically formatted and named Unicode ini files with keys matching the names of controls you need to change. The values will be Unicode representations of what you want to have there. The ini files will be in different folders, one for each language. Have the user select a language, go to that folder and pull the contents of the ini file into the captions, text or whatever for each control you need to modify. Tedious.

Bill

(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.

rolfk · ‎02-02-2017

So you have a user selection for a language and based on that select some language file to read in the strings and apply to the controls. Why would you then need to determine the type of string that is applied? I'm still not really understanding the problem here. The LabVIEW user interface will either be using the (unsupported) Unicode setting or MBCS, but never both. If you need to define multiple languages, and have determined that you can live with the many restrictions of Unicode support in LabVIEW when using the unsupported ini key, make all the necessary controls Unicode and be done with it.

Since you know the language you want to apply, sort the strings accordingly, if it comes from language files do as bill has suggested by putting them in different files (or as I have done in the past to different columns in a tab seperated file and load them accordingly. Have these files correctly encoded, matching the controls encoding. Each file (or column) then defines a default encoding to use, where you can decide if you need to convert the string from the file into the Unicode UTF16LE that the LabVIEW controls need. Et voila!

There simply is no reliable way to detect if a random byte stream is an ASCII string, a Unicode string of any specific type or just some garbage. Sure you can try to parse the byte stream with the various unicode encodings and if you end up with an invalid codepoint, assume that it can't be that encoding, but that is not only unreliable, but also extremely performance intense. So don't do that!!!

You know the language setting your application is in currently. Use that to determine if you need to do special handling rather than trying to guess from a random byte stream what it might me.

Rolf Kalbermatter My Blog

DEMO, Electronic and Mechanical Support department, room 36.LB00.390

J-M · ‎02-02-2017

Did you take a look to this post:

http://forums.ni.com/t5/LabVIEW/Multilanguage-user-interface/m-p/2744846/highlight/true#M810815

Jean-Marc
LV2019
Free PDF Report with iTextSharp

harry0725 · ‎02-02-2017

To rolfk:

Yes, you are totally correct about what I'm trying to do for my project. Which has multi-language selection for user to switch. I'll try to use your suggestion and see if any solution. Thanks for kind help.

To J-M:

I never read this thread before. Thought it didn't diretly solve my problem, it inspire me how to improve my Unicode system translation. Thanks for that! Sorry about that I can only set one best solution.

LabVIEW

How to determine string is ASCII or Unicode?

Re: How to determine string is ASCII or Unicode?

Re: How to determine string is ASCII or Unicode?

Re: How to determine string is ASCII or Unicode?

Re: How to determine string is ASCII or Unicode?

Re: How to determine string is ASCII or Unicode?

Re: How to determine string is ASCII or Unicode?

Re: How to determine string is ASCII or Unicode?

Re: How to determine string is ASCII or Unicode?