Here's a fairly time and resource efficient approach that reads values and calculates their average values for all files in parallel, but only reads in one line for each file at a time - however, it temporarily reads the entire first .dat into memory to determine how many lines and columns of numbers will be in each file.
You did not say if your “numbers” were integers or floating or what, therefore, it reads them as floating points (which will work even if it is not). Regardless of this, averages are calculated and displayed as floating point numbers.
Refresh
I modified my original answer to also calculate the standard deviation of the population ( sigma ) of the values in each row and column according to your comment. He does this immediately after calculating their average value, so a second pass is not required to read all the data again. In addition, in response to the suggestion made in the comments, a context manager was added to ensure that all input files are closed.
Please note that standard deviations are printed only and not written to the output file, but the execution of the same or a separate file should be simple enough to add.
from contextlib import contextmanager from itertools import izip from glob import iglob from math import sqrt from sys import exit @contextmanager def multi_file_manager(files, mode='rt'): files = [open(file, mode) for file in files] yield files for file in files: file.close()
source share