Using sed to enter a new line after each> in a 1 megabyte text file with one line

Question

Using sed to enter a new line after each> in a 1 megabyte text file with one line

I have a giant text file (about 1.5 gigabytes) with xml data. All the text in the file is on one line and tries to open it in any text editor (even those mentioned in this thread: A text editor to open large (giant, huge, large) text files ) either fails or is completely unsuitable from - due to the hang of the text editor when trying to scroll.

I was hoping to enter newlines in the file using the following sed command

sed 's/>/>\n/g' data.xml > data_with_newlines.xml

Unfortunately, this led to sed giving me a segmentation error. From what I understand, sed reads the file line by line, which in this case means that it is trying to read the entire file with 1.5 gigabytes in one line, which undoubtedly explains segfault. However, the problem remains.

How do I enter new lines after each> in an xml file? Do I have to resort to writing a small program to do this for me by reading the file by character?

+4

xml newline sed

wasatz Mar 18 '10 at 8:56

source share

2 answers

This may work for you.

0

potong Dec 11 '11 at 2:56 p.m.

source share

ghostdog74 · Accepted Answer · 2010-03-18T09:10:57+0000

some sed has a limit. GNU sed has no limits, if it can have more than "virtual" memory "malloc ()", you can feed or construct lines as long as you like. (from the document)

I would suggest, if possible, changing the way this xml file is created. (Why is all this on one line?). Otherwise, you can read it one character at a time. for example using a shell

 while read -n 1 ch do case "$ch" in ">" ) printf "%s\n" "$ch";; *) printf "%s" $ch;; esac done <"file"

or

 while read -n 1000 str ; do echo "${str//>/> }" done < file

Using sed to enter a new line after each> in a 1 megabyte text file with one line

More articles: