I have a very large column delimited file coming out of a database report something like this:
field1,field2,field3,metricA,value1
field1,field2,field3,metricB,value2
I want the new file to have combined lines like this so it looks something like this:
field1,field2,field3,value1,value2
I can do this using a hash. In this example, the first three fields are the key, and I combine value1 and value in a specific order as value. After I read in the file, I just print the hash table keys and values ββin another file. It works great.
However, I have some problems, as my file will be very large. About 8 GB per file.
Would there be a more efficient way to do this? I do not think about speed, but in terms of memory size. I am concerned that this process may die due to memory problems. I just draw a space in terms of a solution that will work, but it will not cram everything, ultimately a very large hash.
For full disclosure, I use ActiveState Perl on Windows.