How to replace the character INSIDE the text content of many files automatically?

I have a /myfolder folder containing many latex tables.

I need to replace the symbol in each of them, namely, replace the minus sign - with en dash .

Just to be sure: we replace hypens INSIDE with the entire tex file in this folder. I don't need tex file names.

Doing this manually would be a nightmare (too many files, too many minuses). Is there a way to automatically iterate over files and perform a replacement? A solution in Python / R would be great.

Thanks!

+5
source share
5 answers

sed -i -e 's/-/–/g' /myfolder/* should work.

The syntax e executes s earch g and replaces everything - inside files whose shell expands with /myfolder/* using < . Sed makes the change i n-place, that is, overwrites the original file (you need to explicitly specify the backup file on MacOS, but I can not remember the parameter).

Absolutely do not care about omitting or not - this is a shorthand hyphen or part of the latex syntax. Remember this.

+4
source

Try with sed

 find /home/milenko/pr -type f -exec \ sed -i 's/-/–/g' {} + 

from the command line (if you are using Linux)

More on type

The find -exec find clause uses {} to represent consistent files.

+3
source

To rename file names, use

 rename 's/-/–/g' * 

it will rename all hyphens to en dash.

To replace all content from a hyphen to en dash, use

  sed -i 's/-/–/g' *tex 
+2
source

Python solution

 import os directory = os.getcwd() for filename in os.listdir(directory): if "-" in filename: os.rename(os.path.join(directory,filename),os.path.join(directory,filename.replace("-","-"))) 

New solution for replacing characters inside a file

u2212 is the unicode character for minus and u2014 for en-dash.

 import os directory = os.getcwd() import fnmatch def _changefiletext(fileName): with open(fileName,'r') as file: str = file.read() str = str.decode("utf-8").replace(u"\u2212",u"\u2014").encode("utf-8") with open(fileName,'wb') as file: file.write(str) # Filter the files on which you want to run the replace code (*.txt in this case) matches = [] for root, dirnames, filenames in os.walk(directory): for filename in fnmatch.filter(filenames, '*.txt'): matches.append(os.path.join(root, filename)) for filename in matches: print "Converting file %s" %(filename) _changefiletext(filename) 
+1
source

Return all files first before deleting the ".bak" in the code. I do not want you to lose something, or if my omissions are script, I would like you to be able to recreate what you have.

Secondly, this is probably not very good Python code, because I'm not an expert. But it works if you are editing in utf-8. Since en dash is not an ASCII character, direct replacement does not work. I admit, I'm not quite sure what is happening here, so larger python experts can figure out where I can do better.

 #-*- coding: utf-8 -*- import codecs import glob import re import os def replace_file(file): endash = "–".encode('utf-8') print ("Replacing " + file) temp = codecs.open("temp", "w", "utf-8") with codecs.open(file) as f: for line in f: line = re.sub("-", "–", line) temp.write(line) temp.close() f.close() os.system("copy temp \"" + file + ".bak\"") x = glob.glob("*.tex") for y in x: replace_file(y) 
+1
source

Source: https://habr.com/ru/post/1269651/


All Articles