LabVIEW

cancel
Showing results for 
Search instead for 
Did you mean: 

How to determine string is ASCII or Unicode?

Solved!
Go to solution

Not really.  Unicode just defines the codepoints.  The presence of a Byte Order Marker (BOM), signifies a Unicode Transformation Format (UTF).  UTFs are methods of implementing Unicode encoding (i.e. they are wrappers for Unicode, not part of Unicode).

"If you weren't supposed to push it, it wouldn't be a button."
0 Kudos
Message 11 of 18
(4,005 Views)

I agree with Bill said, there are just array of bytes as Unicode in LabVIEW.

That's why I cannot determine first two bytes to recognize it.

As Paul said, UTF-16 LE is used to present Unicode as even bytes are nulls.

Basically, I can check even bytes and string length to find it's Unicode or not.

However, the biggest problem is that my application including multi-language exchange will perform in English, Traditional Chinese, even Japanese.

In pure English, even bytes method may be working.

But, in Chinese or Japanese, every byte has value for Unicode strings.

Each Chinese character will include two bytes.

Since now, I got one additional problem for how to determine that Unicode string is English or Chinese.

If there's way to define, I can use eve bytes method to figure out English parts, and maybe use first two bytes method for Chinese.

0 Kudos
Message 12 of 18
(3,987 Views)

This whole thing sounds a bit backwards to me.  I think the language issue should be settled at install time.  You shouldn't be trying to determine it on the fly.

Bill
CLD
(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.
0 Kudos
Message 13 of 18
(3,982 Views)

Maybe I make some confuse.

I'm writing a project which will include multi-language function for user to switch.

And I use Unicode support in my LabVIEW development environment.

So I cannot decide which language user is going to use.

In other words, my program should be working for each kind of language.

Of course I can make some additional terminal for my Unicode VIs to define data type and language format.

But it will take lots of unnecessary case structures, make coding seems like hard code.

I prefer not to use this kind of solution.

That's why I'm trying to determine Unicode automatically.

0 Kudos
Message 14 of 18
(3,972 Views)

You could make identically formatted and named Unicode ini files with keys matching the names of controls you need to change.  The values will be Unicode representations of what you want to have there.  The ini files will be in different folders, one for each language.  Have the user select a language, go to that folder and pull the contents of the ini file into the captions, text or whatever for each control you need to modify.  Tedious.

Bill
CLD
(Mid-Level minion.)
My support system ensures that I don't look totally incompetent.
Proud to say that I've progressed beyond knowing just enough to be dangerous. I now know enough to know that I have no clue about anything at all.
Humble author of the CLAD Nugget.
0 Kudos
Message 15 of 18
(3,968 Views)
Solution
Accepted by harry0725

So you have a user selection for a language and based on that select some language file to read in the strings and apply to the controls. Why would you then need to determine the type of string that is applied? I'm still not really understanding the problem here. The LabVIEW user interface will either be using the (unsupported) Unicode setting or MBCS, but never both. If you need to define multiple languages, and have determined that you can live with the many restrictions of Unicode support in LabVIEW when using the unsupported ini key, make all the necessary controls Unicode and be done with it.

 

Since you know the language you want to apply, sort the strings accordingly, if it comes from language files do as bill has suggested by putting them in different files (or as I have done in the past to different columns in a tab seperated file and load them accordingly. Have these files correctly encoded, matching the controls encoding. Each file (or column) then defines a default encoding to use, where you can decide if you need to convert the string from the file into the Unicode UTF16LE that the LabVIEW controls need. Et voila!

 

There simply is no reliable way to detect if a random byte stream is an ASCII string, a Unicode string of any specific type or just some garbage. Sure you can try to parse the byte stream with the various unicode encodings and if you end up with an invalid codepoint, assume that it can't be that encoding, but that is not only unreliable, but also extremely performance intense. So don't do that!!!

 

You know the language setting your application is in currently. Use that to determine if you need to do special handling rather than trying to guess from a random byte stream what it might me.

 

 

Rolf Kalbermatter  My Blog
DEMO, Electronic and Mechanical Support department, room 36.LB00.390
0 Kudos
Message 16 of 18
(3,957 Views)
0 Kudos
Message 17 of 18
(3,941 Views)

To 

0 Kudos
Message 18 of 18
(3,936 Views)