08-23-2023 01:46 AM
@rolfk wrote:
You are aware that the IsTextUnicode() API is badly broken since ever? It will return bad status with certain patterns of text..
Yeah, the context help for the encoding detection VI states it isn't perfect and will fail on certain byte orders. I'll probably replace it with a more heavy weight encoding detection like uchardet in a future version.
08-23-2023 02:05 AM
Thanks again.
Here are additional information. In the attached files one sees that the “č” character (\10D) creates a New Line. Actually without using the ‘Convert EOL’, the resulting string is well displayed. But since I have to pinpoint to the correct {line,column} element of the resulting table (convert String to Array of Strings detects too many \n) I miss the target 😉 The other possibility would be not to use the Array of Strings conversion but to count the \t…
08-23-2023 03:35 AM
08-23-2023 04:04 AM
Yes I know that different delimiters can be configured but in the case that I presented above it is right in the middle of a string that some strange characters like “č” ‘breaks’ the string and generates a new line that should never exist. And whatever the delimiter used it creates a 2D String array that is unusable to retrieve specific {line,column} elements...
08-23-2023 04:38 AM
08-23-2023 04:45 AM
Try to run your code with some “č” in your strings...
On my side the original problem is still there...
08-23-2023 05:09 AM
08-23-2023 05:34 AM
Your example works well.
Mine does not.
I am attaching my Unicode file...
08-23-2023 06:03 AM
08-23-2023 10:29 AM - edited 08-23-2023 10:32 AM
In the meantime I have built the attached patch that seems to work. At least in my situation.
It shows that "element from 2D" does not work correctly whereas "element from 1D" does the job. In other words the "Reshape Array" function also creates an issue in the further management of Unicode...