Python: reading space-separated lines from a readline-like file

Question

Python: reading space-separated lines from a readline-like file

In Python, f.readline() returns the next line from the file f . That is, it starts at the current position f , reads until it encounters a line break, returns everything in between and updates the position f .

Now I want to do the same thing, but for space-separated files (and not just newlines). For example, consider a file f with the contents

 token1 token2 token3 token4 token5

So I'm looking for some readtoken() function, so after opening f first call to f.readtoken() returns token1 , the second call reconfigures token2 , etc.

For efficiency and to avoid problems with very long lines or very large files, buffering should not be.

I was pretty sure that this should be possible out of the box with the standard library. However, I did not find a suitable function or way to override the delimiters for readline() .

+4

python file-io

azimut May 06 '13 at 15:58

source share

1 answer

Martijn pieters · Accepted Answer · 2013-05-06T15:59:51+0000

You need to create a wrapper function; it's simple enough:

 def read_by_tokens(fileobj): for line in fileobj: for token in line.split(): yield token

Note that .readline() does not just read the file character by character until a new line is encountered; the file is read in blocks (buffer) to improve performance.

The above method reads the file line by line, but gives the result of the breakdown by spaces. Use it as:

 with open('somefilename') as f: for token in read_by_tokens(f): print(token)

Since read_by_tokens() is a generator, you need to either iterate over the result of the function directly, or use the next() function to get the tokens one by one:

 with open('somefilename') as f: tokenized = read_by_tokens(f) # read first two tokens separately first_token = next(tokenized) second_token = next(tokenized) for token in tokenized: # loops over all tokens *except the first two* print(token)

Python: reading space-separated lines from a readline-like file

More articles: