03-31-2021 08:31 PM
Parsing the formatted DBLs into an array of DBLs is probably the most expensive part (that's why large data files should be in binary!):
I can read all three files and split into headers and a ragged 2D array (1D array of clusters with a 1D array each) in under a second (~800ms). I am sure there are plenty of ways to speed it up but this should give you some ideas.
03-31-2021 08:42 PM
@altenbach wrote:
I am sure there are plenty of ways to speed it up but this should give you some ideas.
I would start with changing most of your "Type Cast" into "String To Byte Array". I know for a fact that the String to Byte Array and Byte Array To String are actual no-ops. I doubt the Type Cast has much if any processing, but it is hard to beat doing nothing.
@altenbach wrote:
(that's why large data files should be in binary!)
Absolutely! Text files are (relatively) slow write and slow to read. Due to all of the header data, I would actually lean toward using TDMS in this situation.
03-31-2021 08:45 PM
How odd...
Here's my quick attempt, but I may be committing some benchmarking failing (beyond no averaging)...
Times are in seconds, so 20 and 60ms respectively to Read the file and Split the strings.
Hopefully the Array Size is sufficient to avoid the compiler eliding the string memory, but I wouldn't be certain...
No significant difference here, but when I put the indicator outside the FSS it seemingly took longer - perhaps the array indicator was populated before proceeding to the 2nd High Res Rel Seconds. Here I ensure it is populated afterwards.
03-31-2021 09:44 PM
Your Inner while loop can probably be replaced by "Spreadsheet string to array" with a 1D array of strings as type, %s as format, and linefeed as delimiter. Have not tried if it gives an advantage.
03-31-2021 11:06 PM - edited 03-31-2021 11:26 PM
04-01-2021 11:53 AM
Since the above two tools (read file and make array of lines) does exactly the same things a read lines (-1), but is significantly faster, my guess is that "read lines" is doing many incremental reads of smaller chunks. This probably could be optimized internally for this specific case. It is always faster to read once and do everything else in memory as long as we have sufficient memory. 🙂
Of course if you only want to read the first N lines (Small N), it would be extremely wasteful to read the entire file into memory. Seems like the tool does a compromise for the average use case.