Remove hard line breaks from text with Ruby

Question

Remove hard line breaks from text with Ruby

I have text with hard line breaks in it like this:

This should all be on one line since it one sentence. This is a new paragraph that should be separate.

I want to delete one new line, but keep the double characters of the new line so that it looks like this:

 This should all be on one line since it one sentence. This is a new paragraph that should be separate.

Is there one regular expression for this? (or an easy way)

So far this is my only solution that works, but feels hacky.

 txt = txt.gsub(/(\r\n|\n|\r)/,'[[[NEWLINE]]]') txt = txt.gsub('[[[NEWLINE]]][[[NEWLINE]]]', "\n\n") txt = txt.gsub('[[[NEWLINE]]]', " ")

+4

ruby regex ruby-on-rails

Brian armstrong Jan 28 '11 at 21:20

source share

5 answers

 text.gsub!(/(\S)[^\S\n]*\n[^\S\n]*(\S)/, '\1 \2')

Two groups (\S) perform the same tasks as the images ( (?<!\s)(?<!^) And (?!\s)(?!$) ) In @sln regexes:

they confirm that the translation string is indeed in the middle of the sentence, and
they guarantee that the [^\S\n]*\n[^\S\n]* consumes any other spaces surrounding the string, which allows us to normalize it to one place.

They also make it easier to read regular expressions and (perhaps most importantly) they work in versions of Ruby prior to 1.9 that do not support lookbehinds.

+2

Alan moore Jan 29 '11 at 3:28

source share

There is more to formatting (disabling word wrap) than you think.
If the result is the result of a formatting operation, you must go to these rules to reverse engineer the original.

For example, you have a test

This should all be on one line
since it one sentence.

This is a new paragraph that
should be separate.

If you delete only one new line, it will look like this:

This should all be on one line since it one sentence.
This is a new paragraph thatshould be separate.

In addition, other formatting will be lost, such as intentional newlines, something like:

 This is Chapter 1 Section a Section b

Included in

 This is Chapter 1 Section a Section b

Finding a new line in the question is easy /(?<!\n)\n(?!\n)/
but that you replace it.

Edit : It’s actually not so easy to find autonomous newline characters, because visually they sit among hidden (horizontal) spaces.

There are 4 ways to go.

Delete new line, save surrounding formatting
$text =~ s/(?<!\s)([^\S\n]*)\n([^\S\n]*)(?!\s)/$1$2/g;
Remove new line and formatting, replace space
$text =~ s/(?<!\s)[^\S\n]*\n[^\S\n]*(?!\s)/ /g;

Same as above, but ignores the new line at the beginning or end of the line

$text =~ s/(?<!\s)(?<!^)[^\S\n]*\n[^\S\n]*(?!$|\s)/ /g;
$text =~ s/(?<!\s)(?<!^)([^\S\n]*)\n([^\S\n]*)(?!$|\s)/$1$2/g;

An example of a regular expression breakdown (this is the minimum necessary only to isolate one new line):

 (?<!\s) # Not a whitespace behind us (text,number,punct, etc..) [^\S\n]* # 0 or more whitespaces, but no newlines \n # a newline we want to remove [^\S\n]* # 0 or more whitespaces, but no newlines (?!\s)/ # Not a whitespace in front of us (text,number,punct, etc..)

+1

sln Jan 28 '11 at 10:03

source share

Well, there is the following:

 s.gsub /([^\n])\n([^\n])/, '\1 \2'

It will not do anything leading or ending a newline. If you don’t need to run forward or backward at all, you will win with this option:

 s.gsub(/([^\n])\n([^\n])/, '\1 \2').strip

0

Digitaloss Jan 28 '11 at 21:37

source share

 $ ruby -00 -pne 'BEGIN{$\="\n\n"};$_.gsub!(/\n+/,"\0")' file This should all be on one line since it one sentence. This is a new paragraph thatshould be separate.

0

kurumi Jan 29 '11 at 0:14

source share

Phrogz · Accepted Answer · 2011-01-28T21:30:14+0000

Replace all newline lines that are not followed or preceded by a newline:

 text = <<END This should all be on one line since it one sentence. This is a new paragraph that should be separate. END p text.gsub /(?<!\n)\n(?!\n)/, ' ' #=> "This should all be on one line since it one sentence.\n\nThis is a new paragraph that should be separate. "

Or, for Ruby 1.8 without search queries:

 txt.gsub! /([^\n])\n([^\n])/, '\1 \2'

Remove hard line breaks from text with Ruby

More articles: