Reading, editing and writing a text file using Ruby

Is there a good way to read, edit, and write files in Ruby?

In my online search, I found material offering to read all of this into an array, modify this array, and then write everything. I feel that there should be a better solution, especially if I am dealing with a very large file.

Something like:

myfile = File.open("path/to/file.txt", "r+") myfile.each do |line| myfile.replace_puts('blah') if line =~ /myregex/ end myfile.close 

Where replace_puts will write on top of the current line, and not (above) write the next line, as it does now, because the pointer is at the end of the line (after the separator).

So, each line corresponding to /myregex/ will be replaced with "blah". Obviously, I mean a bit more involved than with respect to processing, and it will be executed on one line, but the idea is the same: I want to read a file line by line and edit certain lines and write when you're done.

Maybe there is a way to just say "rewind right after the last separator"? Or any way to use each_with_index and write via row index number? However, I could not find anything like it.

The best solution that I have so far is to read lines line by line, write them to a new (temporary) file line by line (possibly edited), and then overwrite the old file with the new temp file and delete. Again, I feel that there should be a better way - I don't think I will need to create a new 1gig file to edit some lines in an existing 1GB file.

+43
ruby file io
Dec 09 '10 at
source share
4 answers

In general, there is no way to make arbitrary changes in the middle of a file. This is not a disadvantage of Ruby. This is a limitation of the file system: most file systems make it easy and efficient to expand or shorten a file at the end, but not at the beginning or in the middle. Thus, you cannot rewrite the string in place if its size does not remain the same.

There are two general models for changing the sequence of strings. If the file is not too large, just read it all in memory, change it and write it down. For example, adding β€œKilroy was here” to the beginning of each line of the file:

 path = '/tmp/foo' lines = IO.readlines(path).map do |line| 'Kilroy was here ' + line end File.open(path, 'w') do |file| file.puts lines end 

Although simple, this method has a danger: if a program is interrupted while a file is being written, you will lose some or all of it. You must also use memory to store the entire file. If this is one of the problems, then you may prefer the following method.

You can, as you noticed, write to a temporary file. After that, rename the temporary file to replace the input file:

 require 'tempfile' require 'fileutils' path = '/tmp/foo' temp_file = Tempfile.new('foo') begin File.open(path, 'r') do |file| file.each_line do |line| temp_file.puts 'Kilroy was here ' + line end end temp_file.close FileUtils.mv(temp_file.path, path) ensure temp_file.close temp_file.unlink end 

Since the renaming ( FileUtils.mv ) is atomic, a rewritten input file will appear immediately. If the program is interrupted, the file will be overwritten, otherwise it will not. There is no way to partially rewrite it.

The ensure clause is not strictly necessary: ​​the file will be deleted when the Tempfile instance is garbage collected. However, this may take some time. The ensure block ensures that the tempfile will be cleaned immediately, without waiting for it to be garbage collected.

+62
Dec 09 '10 at 2:30 p.m.
source share

If you want to overwrite a file line by line, you need to make sure that the new line is the same length as the original line. If the new line is longer, part of it will be written in the next line. If the new line is shorter, the rest of the old line remains where it is. The tempfile solution is really much safer. But if you are willing to take the risk:

 File.open('test.txt', 'r+') do |f| old_pos = 0 f.each do |line| f.pos = old_pos # this is the 'rewind' f.print line.gsub('2010', '2011') old_pos = f.pos end end 

If the row size changes, this is possible:

 File.open('test.txt', 'r+') do |f| out = "" f.each do |line| out << line.gsub(/myregex/, 'blah') end f.pos = 0 f.print out f.truncate(f.pos) end 
+6
Dec 09 2018-10-09
source share

Just in case, if you use Rails or Facets , otherwise you otherwise depend on Rails ActiveSupport , you can use the atomic_write extension for File :

 File.atomic_write('path/file') do |file| file.write('your content') end 

Behind the scenes, this will create a temporary file that will later move to the desired path, taking care of closing the file for you.

It also clones the file permissions of the existing file or, if it is not, the current directory.

0
Jan 21 '15 at 15:10
source share

You can write in the middle of the file, but you must be careful to keep the length of the line that you overwrite, otherwise you will overwrite some of the following texts. I gave an example here using File.seek, IO :: SEEK_CUR gives the current position of the file pointer, at the end of the line that has just been read, +1 for the CR character at the end of the line.

 look_for = "bbb" replace_with = "xxxxx" File.open(DATA, 'r+') do |file| file.each_line do |line| if (line[look_for]) file.seek(-(line.length + 1), IO::SEEK_CUR) file.write line.gsub(look_for, replace_with) end end end __END__ aaabbb bbbcccddd dddeee eee 

After execution, at the end of the script you now have the following, and not what you had in mind, I assume.

 aaaxxxxx bcccddd dddeee eee 

Given this, the speed using this method is much better than the classic "read and write to new file" method. See These tests in a 1.7 GB music data file. For the classic approach, I used the Wayne technique. The test is performed using the .bmbm method, so file caching does not play a big role. Tests are performed using MRI Ruby 2.3.0 on Windows 7. The strings were effectively replaced, I checked both methods.

 require 'benchmark' require 'tempfile' require 'fileutils' look_for = "Melissa Etheridge" replace_with = "Malissa Etheridge" very_big_file = 'D:\Documents\muziekinfo\all.txt'.gsub('\\','/') def replace_with file_path, look_for, replace_with File.open(file_path, 'r+') do |file| file.each_line do |line| if (line[look_for]) file.seek(-(line.length + 1), IO::SEEK_CUR) file.write line.gsub(look_for, replace_with) end end end end def replace_with_classic path, look_for, replace_with temp_file = Tempfile.new('foo') File.foreach(path) do |line| if (line[look_for]) temp_file.write line.gsub(look_for, replace_with) else temp_file.write line end end temp_file.close FileUtils.mv(temp_file.path, path) ensure temp_file.close temp_file.unlink end Benchmark.bmbm do |x| x.report("adapt ") { 1.times {replace_with very_big_file, look_for, replace_with}} x.report("restore ") { 1.times {replace_with very_big_file, replace_with, look_for}} x.report("classic adapt ") { 1.times {replace_with_classic very_big_file, look_for, replace_with}} x.report("classic restore") { 1.times {replace_with_classic very_big_file, replace_with, look_for}} end 

What gave

 Rehearsal --------------------------------------------------- adapt 6.989000 0.811000 7.800000 ( 7.800598) restore 7.192000 0.562000 7.754000 ( 7.774481) classic adapt 14.320000 9.438000 23.758000 ( 32.507433) classic restore 14.259000 9.469000 23.728000 ( 34.128093) ----------------------------------------- total: 63.040000sec user system total real adapt 7.114000 0.718000 7.832000 ( 8.639864) restore 6.942000 0.858000 7.800000 ( 8.117839) classic adapt 14.430000 9.485000 23.915000 ( 32.195298) classic restore 14.695000 9.360000 24.055000 ( 33.709054) 

Thus, replacing in_file was 4 times faster.

0
Jan 07 '17 at 20:25
source share



All Articles