Using sed to enter a new line after each> in a 1 megabyte text file with one line

I have a giant text file (about 1.5 gigabytes) with xml data. All the text in the file is on one line and tries to open it in any text editor (even those mentioned in this thread: A text editor to open large (giant, huge, large) text files ) either fails or is completely unsuitable from - due to the hang of the text editor when trying to scroll.

I was hoping to enter newlines in the file using the following sed command

sed 's/>/>\n/g' data.xml > data_with_newlines.xml 

Unfortunately, this led to sed giving me a segmentation error. From what I understand, sed reads the file line by line, which in this case means that it is trying to read the entire file with 1.5 gigabytes in one line, which undoubtedly explains segfault. However, the problem remains.

How do I enter new lines after each> in an xml file? Do I have to resort to writing a small program to do this for me by reading the file by character?

+4
source share
2 answers

some sed has a limit. GNU sed has no limits, if it can have more than "virtual" memory "malloc ()", you can feed or construct lines as long as you like. (from the document)

I would suggest, if possible, changing the way this xml file is created. (Why is all this on one line?). Otherwise, you can read it one character at a time. for example using a shell

 while read -n 1 ch do case "$ch" in ">" ) printf "%s\n" "$ch";; *) printf "%s" $ch;; esac done <"file" 

or

 while read -n 1000 str ; do echo "${str//>/> }" done < file 
+4
source
0
source

Source: https://habr.com/ru/post/1304440/


All Articles