problems with read / ReadLine and large textfiles

DocEye · ‎10-02-2008

Hi,

I am trying to write a very simple little app that removes unwanted lines from a textfile. My approach is:

fHdle = OpenFile (fName, VAL_READ_ONLY, VAL_OPEN_AS_IS, VAL_ASCII);

fOutHdle = OpenFile (fOutName, VAL_WRITE_ONLY, VAL_TRUNCATE, VAL_ASCII);

                while(ReadLine (fHdle, buf, -1) != -2) {
                    if (FindPattern (buf, 0, -1, "SequenceID", 1, 0) == -1)
                        WriteLine (fOutHdle, buf, -1);
                }

CloseFile(fHdle);

CloseFile(fOutHdle);

This works perfectly fine when the files are somewhat small, say I copied the first ~1000 lines of my original textfile into a new one, when I am trying it with the original big file (~10MB / 200.000 lines), it doesn't, the buffer is empty, 0 bytes read but no FmtIOErr. Any help is much appreciated!

thanks tons in advance

S_Hong · ‎10-02-2008

Hi DocEye,

I cannot see anything wrong at the moment in your code. Using the original file, could you tell me what is written on the output file? Is it completely empty? Also could we try setting breakpoints on your while loop to see if the ReadLine function is returning anything into your buffer?

S_Hong
National Instruments
Applications Engineer

DocEye · ‎10-06-2008

Hi and thanks for looking into that,

the output is garbage (ഊഊഊഊഊഊഊഊഊഊഊഊഊഊഊഊഊഊഊഊ) I'd say 1 per line, and the ReadLine function returns garbage into the buffer too, I now got a fatal run time error 'General Protection' fault after reading a few 10K lines.

Once again, using the first ~10K lines of the same file works perfect. If you want me to I can send you the files via email? The original one is ~10MB.

thanks heaps

RobertoBozzolo · ‎10-06-2008

Just a simple suggestion: I would try to add some error checking and see if some error is raised while reading. Someting like this:

while ((error = ReadLine (fHdle, buf, -1)) != -2) {

if (error == -1) {

error = GetFmtIOError ();

Messagepopup ("Error", GetFmtIOErrorString (error));

}

Proud to use LW/CVI from 3.1 on.

My contributions to the Developer Community
________________________________________
If I have helped you, why not giving me a kudos?

DocEye · ‎10-07-2008

sorry not solved, just accidentally clicked!

I've put the error checking in,

it goes throught the error checking loop without an error, I've checked FMT errors before and there is none raised.

I've had another, closer, look at buffer, it is actually my line just with 0's between the chars i.e.the "bytes read" return value of ReadLine is, in case I am using the big file, double the number of bytes in the respective row in the input file + 1 and the buffer contains [0,60,0,77,0,105......] instead of [60,77,105] for string '<Mi' in input file. I really appreciate your help here!

thanks!

RobertoBozzolo · ‎10-07-2008

I see no Fmt () functions in the code you originally posted: can you show us the complete reding / writing code? Adding a small data file (a few lines just to test Scan / Fmt functions) could be useful too.

Proud to use LW/CVI from 3.1 on.

My contributions to the Developer Community
________________________________________
If I have helped you, why not giving me a kudos?

dummy_decoy · ‎10-07-2008

ascii characters preceded by a zero byte is the UTF-16 encoding of those ascii characters. you are now having a UNICODE problem.

are you sure your original file is not UTF-16 encoded ? you would not have noticed when editing, then copied a subset of the data to a new file and saved it normally as ASCII...

it would explain why it works with your subset and not with the original file.

DocEye · ‎10-07-2008

That is it mate!

Thanks to Roberto too, there is no Fmt, just ReadLine, compare content and WriteLine if condition is satisfied.

But yes, the textfile is UTF 16 encoded, opens in notepad just normal so haven't thought about that.

Now, if the initial one wasn't already a stupid question, how do I process those? I guess writing a function that returns only every other char in a string as a *char would work or are there other ways?

Thanks so much for the, as always, prompt and professional support from the comunity and NI staff!

dummy_decoy · ‎10-07-2008

i don't think CVI supports anything multibyte. but the C standard library defines a whole lot of functions taking 16bit character strings as input. you have to use type wchar_t in place of type char, then use wcs*() functions. those functions are wide character version of their str*() equivalent (for example strlen() works on char string, wcslen() works on wchar_t strings). here, the wcsstr() function would replace the FindPattern function.

unfortunately:

- the C standard library does not provide any equivalent to the ReadLine() function. but you can easily roll your own.

- it seems wcsstr() is not part of the CVI C runtime library. the only solution i see is using the Windows SDK functions, but i can't find any equivalent function in the Windows API either..

a naive and crude implementation would, as you suggest, read pairs of 2 bytes, and remove the first. be aware that it will surely not work if the log file contains some non-english characters (french, arabic, chinese, japanese...) but it may not be a problem, depending on your application...

LabWindows/CVI

problems with read / ReadLine and large textfiles

problems with read / ReadLine and large textfiles

Re: problems with read / ReadLine and large textfiles

Re: problems with read / ReadLine and large textfiles

Re: problems with read / ReadLine and large textfiles

Re: problems with read / ReadLine and large textfiles

Re: problems with read / ReadLine and large textfiles

Re: problems with read / ReadLine and large textfiles

Re: problems with read / ReadLine and large textfiles

Re: problems with read / ReadLine and large textfiles