09-15-2010 11:12 AM
I was wondering if anyone had advice on how to speed up operations using these functions. Generally, I'm using them to build up text files with columns of data, like:
for(i=0;i<numitems;i++)
FmtFile(file_handle, "%s<%f%c%f%c%f%c", dataarray1[i], 9, dataarray2[i], 9, dataarray3[i], 10);
This produces a very nicely formatted file that can be read with Excel and other applications. The file can be read back by the application in a similar fashion:
for(i=0;i<numitems;i++)
ScanFile(file_handle, "%s>%f[d]%f[d]%f[d]", &dataarray1[i], &dataarray2[i], &dataarray3[i]);
This works great for small data files. However, if the number of items gets large, or if the file is on a network share or USB drive this is rather slow. It seems that this method causes lots of small file I/O operations that slow things down, especially if the file is not on the local harddrive. I've tried thinking of a few ways to speed things up, including:
1. Witing the arrays all at once with one call to FmtFile instead of a loop, such as
FmtFile(file_handle, "%s<%*f%*f%*f", numitems, dataarray1[i], numitems, dataarray2[i], numitems, dataarray3[i]);.
The problem with this is the file wouldn't have the nice columns in it anymore which would make it a lot harder to import into other applications like Excel.
2. Using a temporary interleaved array. Reading in the whole array at once could be done like this:
ScanFile(file_handle, "%s>%*f", 3*numitems, tempdataarray);
Then seperate tempdataarray into their respective arrays in a loop. This seems like a good solution for reading, as I don't think the different seperators would conduse the ScanFile function, but I can't think of a way to write the file in a similar way and get the nice columns.
Any other ideas? I think what would be ideal would be a version of Fmt and Scan for operating on large buffers, where there is an internal pointer that moves with each call in the same way as FmtFile and ScanFile works with a file pointer. Then a large temporay buffer could be used to hold the file contents in memory, and the buffer could be formatted and scanned usning loops exactly the same as if using FmtFile and ScanFile.
Thanks.
09-16-2010 12:10 AM
I have no actual figures in favour of one function vs. the other but I would try using a single ArrayToFile line instead of FmtFile in a loop: some simple tests could be enough to test its speed compared with the other solution.
09-16-2010 01:47 AM
Actually I must correct myself: ArrayToFile is way too time expensive than FmtFile.
I made some little tests on 3*50000 elements arrays and I have found that a slight speed increment can be obtained excluding unnecessary formatting from FmtFile, i.e. changing the line to
FmtFile(fH, "%f\t%f\t%f\n", array1[i], array2[i], array3[i]);
that is using constants instead of generating the appropriate code.
Here the results of 10 iterations of the following code (average of time results after excluding the best and worst result for each test type):
Original FmtFile: 1.765 secs
FmtFile with constants: 1.575 secs
ArrayToFile: 3.395 secs
Here the code used (executed in the interactive window: once compiled the time should slightly reduce)
#include <formatio.h>
#include <utility.h>
#include "toolbox.h"
static int i, fH;
static double array1[50000], array2[50000], array3[50000];
static double tini;
#include <analysis.h>
static double totarray[150000];
// Generating some random data
SetRandomSeed ((int)Timer ());
for (i = 0; i < 50000; i++) {
array1[i] = Random (0.0, 100.0);
array2[i] = Random (0.0, 100.0);
array3[i] = Random (0.0, 100.0);
}
DebugPrintf ("\n");
// Original code
tini = Timer ();
fH = OpenFile ("c:\\temp\\test1.txt", VAL_WRITE_ONLY, VAL_TRUNCATE, VAL_ASCII);
for (i = 0; i < 50000; i++)
FmtFile(fH, "%f%c%f%c%f%c", array1[i], 9, array2[i], 9, array3[i], 10);
CloseFile (fH);
DebugPrintf ("Saving 50000 * 3 items on a file takes %.3f secs (original)\n", Timer () - tini);
// FmtFile optimized with literal constants
tini = Timer ();
fH = OpenFile ("c:\\temp\\test2.txt", VAL_WRITE_ONLY, VAL_TRUNCATE, VAL_ASCII);
for (i = 0; i < 50000; i++)
FmtFile(fH, "%f\t%f\t%f\n", array1[i], array2[i], array3[i]);
CloseFile (fH);
DebugPrintf ("Saving 50000 * 3 items on a file takes %.3f secs (optimized)\n", Timer () - tini);
// ArrayToFile
tini = Timer ();
Copy1D (array1, 50000, totarray);
Copy1D (array2, 50000, totarray + 50000);
Copy1D (array3, 50000, totarray + 100000);
ArrayToFile ("c:\\temp\\test3.txt", totarray, VAL_DOUBLE, 150000, 3, VAL_GROUPS_TOGETHER, VAL_GROUPS_AS_COLUMNS, VAL_SEP_BY_TAB, 10, VAL_ASCII, VAL_TRUNCATE);
DebugPrintf ("Saving 150000 items on a file takes %.3f secs\n", Timer () - tini);
09-16-2010 02:06 AM - edited 09-16-2010 02:07 AM
Another option, in case saving time is crucial at runtime, is to save data in binary format and let the operator convert results in text format in a post-processing step. In this case saving process takes only 0.043 secs
Code used for test:
#include <formatio.h>
#include <utility.h>
#include "toolbox.h"
static int i;
static double array1[50000], array2[50000], array3[50000];
static double tini;
static FILE *bH;
// Generating some random data
SetRandomSeed ((int)Timer ());
for (i = 0; i < 50000; i++) {
array1[i] = Random (0.0, 100.0);
array2[i] = Random (0.0, 100.0);
array3[i] = Random (0.0, 100.0);
}
DebugPrintf ("\n");
// Saving in binary format
tini = Timer ();
bH = fopen ("c:\\temp\\test1.dat", "wb");
fwrite ((char *)&array1, sizeof(double), 50000, bH);
fwrite ((char *)&array2, sizeof(double), 50000, bH);
fwrite ((char *)&array3, sizeof(double), 50000, bH);
fclose (bH);
DebugPrintf ("Saving 50000 * 3 items on a binary file takes %.3f secs\n", Timer () - tini);
09-16-2010 12:56 PM - edited 09-16-2010 12:57 PM
Thanks for the reply. I didn't realize that the way things were formatted actually makes a difference. I'll definitely use the tip about removing the unnecessary formatting, as that's an easy way to save about 10% in time. I'm also surprised about ArrayToFile(..). I kind of assumed that it was internally using FmtFile(..) in some fashion as it seemed to be no faster but I'm surprised it's that actually twice as slow.
I'd still like a way to reduce the number of file I/O operations. A few seconds I don't mind, which is the case when accessing the local harddrive, but running the code you posted on a network share yields much longer times:
Unoptimized:
288.767 seconds
Optimized:
291.414 seconds
ArrayToFile:
885.628 seconds
That's a long time for a file that ends up being 1,483kB. For comparison I got 2.088, 1.921, and 4.613 seconds on my PC writing to the harddrive, so my computer is not quite as fast as yours but isn't terrible.
One idea I did have is using the append [a] modifier to build up a temporary buffer, then doing the write all at once with FmtFile:
static char tempbuf[2000000];
tini = Timer ();
tempbuf[0] = '\0';//write a null for the first character here so Fmt starts appending at the beginning
for (i=0;i<50000;i++)
Fmt(tempbuf, "%s[a]<%f\t%f\t%f\n", array1[i], array2[i], array3[i]);
fH = OpenFile ("s:\\temp\\test4.txt", VAL_WRITE_ONLY, VAL_TRUNCATE, VAL_ASCII);
FmtFile(fH, "%s<%s", tempbuf);
DebugPrintf ("Saving 150000 items on a file using temp buffer takes %.3f secs\n", Timer () - tini);
This works, but yields a time of 34.511 seconds, of which 0.757 seconds is the call to FmtFile and the rest is spend in the for loop if I write to my harddrive. If I had to guess each time Fmt is called with the [a] modifier it has to loop through the string looking for the null at the end, which is eating up lots of time. However, running this test on the network yields 343.297 seconds, showing that the one call to FmtFile still doesn't save time when saving over the network. However, I found that simply changing the VAL_ASCII to VAL_BINARY on the OpenFile line changes the time to 54.477s, which is a considerable time savings. Changing the original unoptimized and optimized loops so that the OpenFile uses VAL_BINARY also sped those loops up to 1.684 and 1.425 seconds (I had to add a carriage return to both lines to achieve the same output). So it appears that the way the file is opened also has a big affect on the speed. The documentation suggests that the only difference between these two options is the way CVI handles carriage returns and linefeeds - it must be because CVI has to loop through any buffer being outputted to scan for carriage returns and line feeds when VAL_ASCII is used?
09-16-2010 04:03 PM
Hi tstanley,
I didn't realize that you must write your file across a network. Based on your figures and my experience, I would try to build up the file on the local disk and execute one single CopyFile to the network drive, thus minimizing network access.
09-16-2010 04:31 PM
Hi Roberto,
Actually we develop software in CVI that is provided to customers. And basically the customers expect to be able to do what they want with it, including saving files to harddrive, network, USB drives, and even floppies (ugh). Other than floppy disks, that doesn't seem unreasonable. On top of that, some of the data sets do get quite large which means that they do take a while to load and save no matter where they are stored. So I am just seeking various ideas on how to speed things up in general. So far the discussion has been helpful.
Copying the file to the local disk isn't a bad idea though. I would imagine there being some trickery involved with determining if the file is being read off of a local disk (versus a network or a removable disk), and where to store the temporary file (you can't always count on the harddrive as being C:), but I'm sure I can figure something out through the SDK if I decide to go down that route.
Thanks!
09-17-2010 06:20 AM
If you don't want to store the file in the application directory, you could search for temporary directory in system settings (GetTempPath SDK function should be helpful for this) or use the default appdata folder, as you can read in this discussion.
09-17-2010 08:04 AM
I was thinking how much more convenient it would be if the Fmt function returned the number of bytes formatted instead of the number of items, then discovered that there is a function called NumFmtdBytes() that does exactly what I want . So using NumFmtdBytes() to track the array location instead of [a] in the Fmt string, and a temp buffer, and a binary file type, I can get the write time down to under a second (0.967s) when writing to my hard drive using this code:
tini = Timer ();
n = 0;
fH = OpenFile ("c:\\temp\\test4.txt", VAL_WRITE_ONLY, VAL_TRUNCATE, VAL_BINARY);
for (i = 0; i < 50000; i++)
{
Fmt(&tempbuf[n], "%f\t%f\t%f\r\n", array1[i], array2[i], array3[i]);
n+=NumFmtdBytes ();
}
FmtFile(fH, "%s<%s", tempbuf);
CloseFile (fH);
DebugPrintf ("Saving 50000 * 3 items on a file using temp buffer takes %.3f secs\n", Timer () - tini);
If I write to the network, it still takes about 20 seconds - which will still much improved - shows that there is something still going on behind the scenes. So it looks like I may still try the temporary file location as you suggested.
09-17-2010 09:24 AM - edited 09-17-2010 09:28 AM
Some more enhancements on this code (all tests saving on local hard disk):
Your observations on opening the file in binary mode and writing a buffer in memory let me think a bit, so I tried some alternatives:
Alternative 1: using fprintf instead of FmtFile:
tini = Timer ();
fH = fopen ("c:\\temp\\test2.txt", "wb");
for (i = 0; i < 50000; i++) {
fprintf (fH, "%f\t%f\t%f\r\n", array1[i], array2[i], array3[i]);
}
fclose (fH);
DebugPrintf ("Saving 50000 * 3 items on a file using temp buffer takes %.3f secs (fprintf)\n", Timer () - tini);
Alternative 2: writing a small buffer with sprintf and appending to the file with fwrite
tini = Timer ();
fH = fopen ("c:\\temp\\test1.txt", "wb");
for (i = 0; i < 50000; i++) {
n = sprintf (a, "%f\t%f\t%f\r\n", array1[i], array2[i], array3[i]);
fwrite (a, 1, n, fH);
}
fclose (fH);
DebugPrintf ("Saving 50000 * 3 items on a file using temp buffer takes %.3f secs (fwrite)\n", Timer () - tini);
Alternative 3 - the best of all: sprintf-ing a big buffer and writing to disk only once
tini = Timer ();
n = 0;
fH = fopen ("c:\\temp\\test5.txt", "wb");
for (i = 0; i < 50000; i++) {
n += sprintf (&tempbuf[n], "%f\t%f\t%f\r\n", array1[i], array2[i], array3[i]);
}
fwrite (tempbuf, 1, n, fH);
fclose (fH);
DebugPrintf ("Saving 50000 * 3 items on a file using temp buffer takes %.3f secs (sprintf + fopen)\n", Timer () - tini);
The code in A3 gave me constantly a significantly better performance with respect to all other alternatives: on my machine I have ~1 sec with your last code and <0.5 sec with A3. The other altermatives are in between.
As far as I can see, minimizing disk access reduces program time -and this can be expected, as disk I/O has always been a bottleneck- and using C functions is better than using CVI Formatting library (which was not quite obvious in my experience - it is to be said that this is the cost to have easier usability and a slightly more flexibility in CVI library IMHO).