 LukasSc
		
			LukasSc
		
		
		
		
		
		
		
		
	
			11-06-2019 09:44 AM - edited 11-06-2019 09:51 AM
Hi,
I am trying to merge several large files into a single file. Each of the files contains several hundred Megabytes of Data in several columns of Tap deliminated txt files.
If I just try to open them, merge them in a concatenating for loop and writing a new file I constantly get the error not enough memory.
What is an easy way to merge several files into one, while avoiding the memory overload issue? Each file contains 4 columns with millions of rows of data, the final file should have the same amount of rows as previously but each column added at the end.
Thank you for your help
Solved! Go to Solution.
 mcduff
		
			mcduff
		
		
		
		
		
		
		
		
	
			11-06-2019 09:55 AM
Adding columns to a text file is difficult to impossible.
If you must use a text file then I suggest(it may not work),
open 1 file convert to binary, close file, open second file convert to binary append to binary array, repeat for third file.
Now you have a binary array of points, convert it to text in chunks, say 100k rows at a time. Append each chunk to the new text file.
mcduff
 RTSLVU
		
			RTSLVU
		
		
		
		
		
		
		
		
	
			11-06-2019 10:09 AM - edited 11-06-2019 10:12 AM
@LukasSc wrote:
Hi,
I am trying to merge several large files into a single file. Each of the files contains several hundred Megabytes of Data in several columns of Tap deliminated txt files.
If I just try to open them, merge them in a concatenating for loop and writing a new file I constantly get the error not enough memory.
What is an easy way to merge several files into one, while avoiding the memory overload issue? Each file contains 4 columns with millions of rows of data, the final file should have the same amount of rows as previously but each column added at the end.
Thank you for your help
This task would be better performed in Excel itself.
If you were clever you could:
 crossrulz
		
			crossrulz
		
		
		 
		
		
		
		
		
	
			11-06-2019 10:10 AM
I would read all of the files 1 line at a time. You then merge everything into a line and write that line. Repeat until you have read all of at least one of the files.
 Yamaeda
		
			Yamaeda
		
		
		
		
		
		
		
		
	
			11-06-2019 10:17 AM
Command window (or system exec)
Copy [file 1] [file 2] .. [file n] [destination file]
/Y
 cbutcher
		
			cbutcher
		
		
		 
		
		
		
		
		
	
			11-06-2019 10:49 AM
I think copy will append them row by row (add rows, not increase column count).
You may be able to hold all of your file references open and scan blocks from each file, use the Spreadsheet String to array (of strings) and appropriate Transpose operations and build array to go through faster than line by line, but it seems likely you cannot open and read all of any one file and make progress.
 thols
		
			thols
		
		
		
		
		
		
		
		
	
			
			
    
	
		
		
		11-07-2019
	
		
		02:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
 - last edited on 
    
	
		
		
		11-21-2024
	
		
		08:43 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
 by 
				
		 Content Cleaner
		
			Content Cleaner
		
		
		
		
		
		
		
		
	
			
		
OK, I think you have some ideas for a solution. But having hundreds of megabytes of data stored as text files is not very good. Merging several of them to create a text file of gigabyte-size is worse. What do you need it for? It will be extremely slow to use and extract data from. Consider using tdms for storing that amount of data instead.
11-07-2019 03:35 AM
Wow TDMS files seem indeed promising. I might have to rewrite some of my Data Acquisition procedures but it seems worth it. Thank you for pointing that out, I wasn't even aware of the format.
 cbutcher
		
			cbutcher
		
		
		 
		
		
		
		
		
	
			11-07-2019 03:52 AM
@LukasSc wrote:
Wow TDMS files seem indeed promising. I might have to rewrite some of my Data Acquisition procedures but it seems worth it. Thank you for pointing that out, I wasn't even aware of the format.
If you're willing to switch to TDMS files, you have the significant advantage of being able to write new columns separately.
That would allow you to parse entire files at a time, and just keep adding new columns (perhaps based on a header row in the existing files?) to the TDMS file.
For TDMS, these would be Channels probably (although you can consider using Groups to group similar measurements, or perhaps to separate by source file, or however you wish).
 thols
		
			thols
		
		
		
		
		
		
		
		
	
			11-07-2019 04:29 AM