How to iterate over files and replace text

Question

How to iterate over files and replace text

I'm a python novice: how can I iterate over csv files in one directory and replace strings, for example.

ww into vv .. into --

So, I do not want to replace the lines having ww in vv, only those lines in this line. I tried something like

 #!/Python26/ # -*- coding: utf-8 -*- import os, sys for f in os.listdir(path): lines = f.readlines()

But how to act?

+4

python python-2.6

atricapilla Feb 07 '11 at 9:26

source share

3 answers

As you want to replace strings with strings of the same length, replacements can be made in place, that is, rewrite only the bits that need to be replaced, without the need to write a whole new modified file.

So, with regex, this is very easy to do. The fact that the file is a CSV file does not matter at all in this method:

 from os import listdir from os.path import join import re pat = re.compile('ww|\.\.') dicrepl = {'ww':'vv' , '..':'--'} for filename in listdir(path): with open(join(path,filename),'rb+') as f: ch = f.read() f.seek(0,0) pos = 0 for match in pat.finditer(ch): f.seek(match.start()-pos, 1) f.write(dicrepl[match.group()]) pos = match.end()

It is absolutely necessary to open such procedures in binary mode: this is "b" in the "rb +" mode.

The fact that the file is opened in the 'r +' mode allows you to read and write anywhere in it (if it was opened in 'a', we could only write at the end of the file)

But if the files are so large that the ch object will have too much memory consumption, it should be changed.

If the replacements will have a different length than the original lines, it is required to write a new file with the changes. (if the length of the replacement lines is always less than the length of the replaced lines, this is a special case and can still be processed without the need to write a new file. This may be interesting in a large file)

The interest in f.seek (match.start () - pos, 1) instead of f.seek (match.start (), 0) is that it moves the pointer from pos to match.start () without moving the pointer from position 0 to match.start () , then from 0 to match.start () .

Conversely, with f.seek (match.start (), 0), the pointer must first be returned to position 0 (the beginning of the file) then move forward, counting the match .start () the number of characters to stop in the correct position match.start () , because searching (..., 0) means that the position has been reached from the beginning of the file, and searching (..., 1) means that the movement is performed from the position CURRENT. EDIT:

If you want to replace only the isolated ww lines, and not the ww lines in the longer wwwwwww lines, the regular expression should be

 pat = re.compile('(?<!w)ww(?!w)|(?<!\.)\.\.(?!\.)')

This is a regular expression feature that can be obtained using replace () without complicated string manipulation.

EDIT:

I forgot the f.seek (0,0) instruction after f.read () . This instruction is necessary to move the file pointer to the beginning of the file, because during reading the pointer moves to the end.

I have adjusted the code and now it works.

Here is the code that follows the processing:

 from os import listdir from os.path import join import re pat = re.compile('(?<!w)ww(?!w)|(?<!\.)\.\.(?!\.)') dicrepl = {'ww':'vv' , '..':'ZZ'} path = ................................... with open(path,'rb+') as f: print "file has just been opened, file pointer is at position ",f.tell() print '- reading of the file : ch = f.read()' ch = f.read() print "file has just been read"+\ "\nfile pointer is now at position ",f.tell(),' , the end of the file' print "- file pointer is moved back to the beginning of the file : f.seek(0,0)" f.seek(0,0) print "file pointer is now again at position ",f.tell() pos = 0 print '\n- process of replacrement is now launched :' for match in pat.finditer(ch): print print 'is at position ',f.tell() print 'group ',match.group(),' detected on span ',match.span() f.seek(match.start()-pos, 1) print 'pointer having been moved on position ',f.tell() f.write(dicrepl[match.group()]) print 'detected group have been replaced with ',dicrepl[match.group()] print 'now at position ',f.tell() pos = match.end()

+1

eyquem Feb 07 '11 at 9:59

source share

See other answers for string replacement information. I want to add additional information about file iterations, the first part of the question.

If you want to overwrite through a directory and all subdirectories, use os.walk() . os.listdir() not recursive or does not include the directory name in the names of the files that it generates. Use os.path.join() to form a more complete path name.

0

Randall cook Jun 01 '12 at 18:44

source share

eumiro · Accepted Answer · 2011-02-07T09:33:13+0000

 import os import csv for filename in os.listdir(path): with open(os.path.join(path, filename), 'r') as f: for row in csv.reader(f): cells = [ cell.replace('www', 'vvv').replace('..', '--') for cell in row ] # now you have a list of cells within one row # with all strings modified.

Edit: Do you need to learn / practice Python, or do you just need to get the job done? In the latter case, use the sed program:

 sed -i 's/www/vvv/g' yourPath/*csv sed -i 's/\.\./,,/g' yourPath/*csv

How to iterate over files and replace text

More articles: