Change separator to "for every" loop on Strings in python

I need to read input text file in python, streaming line by line. This means loading the text file line by line, and not immediately, into memory. But my line breaks are not spaces, they are arbitrary characters.

Below is the Stack method for uploading files line by line:

with open("log.txt") as infile: for line in infile: do_something_with(line) 

The above is fine, however I need to change the delimiter from spaces to another character.

How can I do that? Thanks.

+4
source share
2 answers
 import re def open_delimited(filename, delimiter, chunksize=1024, *args, **kwargs): with open(filename, *args, **kwargs) as infile: remainder = '' for chunk in iter(lambda: infile.read(chunksize), ''): pieces = re.split(delimiter, remainder+chunk) for piece in pieces[:-1]: yield piece remainder = pieces[-1] if remainder: yield remainder for line in open_delimited("log.txt", delimiter='/'): print(repr(line)) 
+5
source

Python has no built-in construct for this. You can write a generator that reads the characters one at a time and accumulates them until you have the entire selected item.

 def items(infile, delim): item = [] c = infile.read(1) while c: if c == delim: yield "".join(item) item = [] else: c = infile.read(1) item.append(c) yield "".join(item) with open("log.txt") as infile: for item in items(infile, ","): # comma delimited do_something_with(item) 

You will get better performance if you read the file in chunks (say 64K or so) and separate them. However, the logic for this is more complicated, since the element can be divided into pieces, so I will not go into it here, since I'm not sure that everything is correct. :-)

+1
source

Source: https://habr.com/ru/post/1490089/


All Articles