Delete the first two lines of a ruby โ€‹โ€‹file

My script reads in large text files and captures the first page with regex. I need to delete the first two lines of each first page or change the regular expression to match 1 line after the line == Page 1 ==. I include the whole script here because I was asked in past questions and because I am new to ruby โ€‹โ€‹and donโ€™t always know how to integrate fragments as answers:

#!/usr/bin/env ruby -wKU require 'fileutils' source = File.open('list.txt') source.readlines.each do |line| line.strip! if File.exists? line file = File.open(line) end text = (File.read(line)) match = text.match(/==Page 1(.*)==Page 2==/m) puts match end 
+4
source share
1 answer

Now that you have updated your question, I had to remove most of such a good answer :-)

I assume that the main topic of your problem was that you wanted to use match[1] instead of match . The object returned by the Regexp.match ( MatchData ) method can be processed as an array that contains the entire matched string as the first element and each subquery in the following elements. Thus, in your case, the match (and match[0] ) variable represents the entire line with matching (along with the labels "== Page .. =="), but you only need the first subexpression, which is hidden in match[1] .


Now about other, minor issues that I feel in your code. Please do not be offended if you already know what I'm saying, but others may benefit from warnings.

Part of your first code ( if File.exists? line ) checked if the file exists, but your code just opened the file (without closing it!) And still tried to open the file a few lines later.

Instead, you can use this line:

 next unless File.exists? line 

the second is that the program should be ready to handle the situation when the file does not have page labels, so it does not match the template. (Then the match variable will be nil )

The third suggestion is that a more complex template can be used. The current ( /==Page 1==(.*)==Page 2==/m ) will return the contents of the page labeled "End Line" as the first character. If you use this template:

 /==Page 1==\s*\n(.*)==Page 2==/m 

then the subexpression will not contain spaces placed on the same line as the text '== Page 1 == `. And if you use this template:

 /==Page 1==\s*\n(.*\n)==Page 2==/m 

then youโ€™ll be sure that the sign โ€œ== Page 2 ==โ€ starts at the beginning of the line.

And the fourth problem is that very often programmers (sometimes including me, of course) tend to forget about closing the file after they are opened. In your case, you opened the "source" file, but there was no source.close statement in the code after the loop. The safest way to process files is to pass the block to the File.open method, so you can use the following form of the first lines of your program:

 File.open('list.txt') do |source| source.readlines.each do |line| 

... but in this case it would be simple to write simply:

 File.readlines('list.txt').each do |line| 

Taking all this together, the code may look like this (I changed the line variable to fname for better readability of the code):

 #!/usr/bin/env ruby -wKU require 'fileutils' File.readlines('list.txt').each do |fname| fname.strip! next unless File.exists? fname text = File.read(fname) if match = text.match(/==Page 1==\s*\n(.*\n)==Page 2==/m) # The whole 'page' (String): puts match[1].inspect # The 'page' without the first two lines: # (in case you really wanted to delete lines): puts match[1].split("\n")[2..-1].inspect else # What to do if the file does not match the pattern? raise "The file #{fname} does NOT include the page separators." end end 
+3
source

Source: https://habr.com/ru/post/1380598/


All Articles