Reading data blocks from a file in Python

Question

Reading data blocks from a file in Python

I am new to python and trying to read "blocks" of data from a file. The file is written something like this:

# Some comment # 4 cols of data --x,vx,vy,vz # nsp, nskip = 2 10 # 0 0.0000000 # 1 4 0.5056E+03 0.8687E-03 -0.1202E-02 0.4652E-02 0.3776E+03 0.8687E-03 0.1975E-04 0.9741E-03 0.2496E+03 0.8687E-03 0.7894E-04 0.8334E-03 0.1216E+03 0.8687E-03 0.1439E-03 0.6816E-03 # 2 4 0.5056E+03 0.8687E-03 -0.1202E-02 0.4652E-02 0.3776E+03 0.8687E-03 0.1975E-04 0.9741E-03 0.2496E+03 0.8687E-03 0.7894E-04 0.8334E-03 0.1216E+03 0.8687E-03 0.1439E-03 0.6816E-03 # 500 0.99999422 # 1 4 0.5057E+03 0.7392E-03 -0.6891E-03 0.4700E-02 0.3777E+03 0.9129E-03 0.2653E-04 0.9641E-03 0.2497E+03 0.9131E-03 0.7970E-04 0.8173E-03 0.1217E+03 0.9131E-03 0.1378E-03 0.6586E-03 and so on

Now I want to be able to point and read only one data block from these many blocks. I use numpy.loadtxt('filename',comments='#') to read the data, but it downloads the whole file in one go. I searched on the Internet and someone created a patch for the numpy io procedure to indicate read blocks, but this is not in bulk numpy.

It is much easier to select data blocks in gnuplot, but I would have to write a routine to build distribution functions. If I can calculate specific blocks, it will be much easier in python. Also, I am moving all my visualization codes to python from IDL and gnuplot, so it would be nice to have everything in python instead of scattering things in multiple packages.

I thought of calling gnuplot from inside python, building a block in a table and assigning the output to some array in python. But I am still starting, and I could not understand the syntax to do this.

Any ideas, pointers to solve this problem will be very helpful.

+6

python numpy block

toylas May 09 '12 at 7:56

source share

2 answers

You might need the following code. You will probably need the re module.

You can open the file for reading using:

 f = open("file_name_here")

You can read the file one line at a time using

 line = f.readline()

To go to the next line starting with "#", you can use:

 while not line.startswith("#"): line = f.readline()

To parse a string that looks like "# i j", you can use the following regular expression:

 is_match = re.match("#\s+(\d+)\s+(\d+)",line) if is_match: i = is_match.group(1) j = is_match.group(2)

See the documentation for the re module for more information.

To parse a block, you can use the following bit of code:

 block = [[]] # block[i][j] will contain element i,j in your block while not line.isspace(): # read until next blank line block.append(map(float,line.split(" "))) # splits each line at each space and turns all elements to float line = f.readline()

Then you can turn your block into a numpy array:

 block = np.array(block)

If you imported numpy as np. If you want to read several blocks between i and j, just put the above code to read one block into a function and use it several times.

Hope this helps!

+1

Pascal bugnion May 09 '12 at 17:05

source share

Emmanuel · Accepted Answer · 2012-05-09T17:23:57+0000

Fast basic reading:

 >>> def read_blocks(input_file, i, j): empty_lines = 0 blocks = [] for line in open(input_file): # Check for empty/commented lines if not line or line.startswith('#'): # If 1st one: new block if empty_lines == 0: blocks.append([]) empty_lines += 1 # Non empty line: add line in current(last) block else: empty_lines = 0 blocks[-1].append(line) return blocks[i:j + 1] >>> for block in read_blocks(s, 1, 2): print '-> block' for line in block: print line -> block 0.5056E+03 0.8687E-03 -0.1202E-02 0.4652E-02 0.3776E+03 0.8687E-03 0.1975E-04 0.9741E-03 0.2496E+03 0.8687E-03 0.7894E-04 0.8334E-03 0.1216E+03 0.8687E-03 0.1439E-03 0.6816E-03 -> block 0.5057E+03 0.7392E-03 -0.6891E-03 0.4700E-02 0.3777E+03 0.9129E-03 0.2653E-04 0.9641E-03 0.2497E+03 0.9131E-03 0.7970E-04 0.8173E-03 0.1217E+03 0.9131E-03 0.1378E-03 0.6586E-03 >>>

Now I think you can use numpy to read lines ...

Reading data blocks from a file in Python

More articles: