Replacing multiple newlines in a file with one

Question

Replacing multiple newlines in a file with one

This function should search for a text file for a new line character. When it finds a newline character, it increments the newLine counter, and when there are more than two consecutive empty newlines, it can compress all empty lines to just one empty line.

In my code, if there are two new lines, they should get rid of them and compress them into one, for testing purposes I also have the print of a "new line" when it reaches the condition newLine < 2 . Right now, he is printing a new line for each new line, whether it is empty or not, and does not free up extra new lines. What am I doing wrong?

EDIT: HERE IS MY FULL CODE http://pastebin.com/bsD3b38a

Thus, basically the program assumes to combine the two files together and perform various operations on them, for example, what I'm trying to do, getting rid of several consecutive empty new lines. So, to accomplish this in cygwin, I do. / a -s file1 file2 Suppose that you combine the files file1 and file2 together into a file called contents.txt and get rid of consecutive new lines and display them on my cygwin terminal (stdout). (-s calls the function to get rid of consecutive lines). The third and fourth arguments passed in the file (file1 and file2) are two files that are supposed to be combined into a single file called contents.txt. The squeeze_lines function is used to read contents.txt and is supposed to wrap new lines . Below you can see an example of the content that I put in file1.txt. file2.txt has only a bunch of words, followed by empty newlines.

 int newLine = 1; int c; if ((fileContents = fopen("fileContents.txt", "r")) == 0) { perror("fopen"); return 1; } while ((c = fgetc(fileContents)) != EOF) { if (c == '\n') { newLine++; if (newLine < 2) { printf("new line"); putchar(c); } } else { putchar(c); newLine = 0; } }

The file that the program reads in a .txt file with this content. Suppose to read a file, get rid of leading and consecutive newlines, and output the new formatted content to stdout on my cywgin terminal.

 /* hello world program */ #include <stdio.h> tab 2tabs

+2

c file

nb023 Jun 26 '15 at 5:38

source share

5 answers

Jonathan leffler · Answer 1 · 2015-06-26T06:44:05+0000

Diagnostics

The logic looks correct if you have line endings on Unix. If you have Windows CRLF line endings, but they process the file on Unix, you have CR before each LF, and CR resets newLine to zero, so you get a message for every new line.

This explains what you see.

It also explains why everyone else says your logic is correct (this is assuming the lines end only with LF, not CRLF), but you see an unexpected result.

How to solve it?

Fair question. One of the main options is to use dos2unix or an equivalent mechanism to convert a DOS file to a Unix file. There are many questions on SO on this.

If you don't need CR characters at all ( '\r' in C), you can just delete (don't print, not newLine zero)).

If you need to keep CRLF line endings you need to be a little more careful. You will need to write that you have CR, and then check that you get LF, then print the pair, and then check if you get more CRLF sequences and suppress them, etc.

Working code - `dupnl.c`

This program reads only from standard input; it is more flexible than just reading from a fixed file name. Learn to avoid writing code that works with only one file name; it will save you a lot of recompilation over time. Th-code processes Unix-style files with newlines ( "\n" ) at the end; it also processes DOS files with CRLF ends ( "\r\n" ); and it also processes (old style) Mac files (Mac OS 9 and earlier) with CR ( "\r" ). In fact, it allows you to randomly mix different styles of line endings. If you want the mode, you have to do some work to decide which mode, and then use the appropriate subset of this code.

 #include <stdio.h> int main(void) { FILE *fp = stdin; // Instead of fopen() int newLine = 1; int c; while ((c = fgetc(fp)) != EOF) { if (c == '\n') { /* Unix NL line ending */ if (newLine++ == 0) putchar(c); } else if (c == '\r') { int c1 = fgetc(fp); if (c1 == '\n') { /* DOS CRLF line ending */ if (newLine++ == 0) { putchar(c); putchar(c1); } } else { /* MAC CR line ending */ if (newLine++ == 0) putchar(c); if (c1 != EOF && c1 != '\r') ungetc(c1, stdin); } } else { putchar(c); newLine = 0; } } return 0; }

Run example - inputs and outputs

 $ cat test.unx data long enough to be seen 1 - Unix data long enough to be seen 2 - Unix data long enough to be seen 3 - Unix data long enough to be seen 4 - Unix data long enough to be seen 5 - Unix $ sed 's/Unix/DOS/g' test.unx | ule -d > test.dos $ cat test.dos data long enough to be seen 1 - DOS data long enough to be seen 2 - DOS data long enough to be seen 3 - DOS data long enough to be seen 4 - DOS data long enough to be seen 5 - DOS $ sed 's/Unix/Mac/g' test.unx | ule -m > test.mac $ cat test.mac $ ta long enough to be seen 5 - Mac $ odx test.mac 0x0000: 0D 0D 64 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 ..data long enou 0x0010: 67 68 20 74 6F 20 62 65 20 73 65 65 6E 20 31 20 gh to be seen 1 0x0020: 2D 20 4D 61 63 0D 0D 64 61 74 61 20 6C 6F 6E 67 - Mac..data long 0x0030: 20 65 6E 6F 75 67 68 20 74 6F 20 62 65 20 73 65 enough to be se 0x0040: 65 6E 20 32 20 2D 20 4D 61 63 0D 64 61 74 61 20 en 2 - Mac.data 0x0050: 6C 6F 6E 67 20 65 6E 6F 75 67 68 20 74 6F 20 62 long enough to b 0x0060: 65 20 73 65 65 6E 20 33 20 2D 20 4D 61 63 0D 64 e seen 3 - Mac.d 0x0070: 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 67 68 20 ata long enough 0x0080: 74 6F 20 62 65 20 73 65 65 6E 20 34 20 2D 20 4D to be seen 4 - M 0x0090: 61 63 0D 0D 0D 0D 64 61 74 61 20 6C 6F 6E 67 20 ac....data long 0x00A0: 65 6E 6F 75 67 68 20 74 6F 20 62 65 20 73 65 65 enough to be see 0x00B0: 6E 20 35 20 2D 20 4D 61 63 0D 0D 0D n 5 - Mac... 0x00BC: $ dupnl < test.unx data long enough to be seen 1 - Unix data long enough to be seen 2 - Unix data long enough to be seen 3 - Unix data long enough to be seen 4 - Unix data long enough to be seen 5 - Unix $ dupnl < test.dos data long enough to be seen 1 - DOS data long enough to be seen 2 - DOS data long enough to be seen 3 - DOS data long enough to be seen 4 - DOS data long enough to be seen 5 - DOS $ dupnl < test.mac $ ta long enough to be seen 5 - Mac $ dupnl < test.mac | odx 0x0000: 64 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 67 68 data long enough 0x0010: 20 74 6F 20 62 65 20 73 65 65 6E 20 31 20 2D 20 to be seen 1 - 0x0020: 4D 61 63 0D 64 61 74 61 20 6C 6F 6E 67 20 65 6E Mac.data long en 0x0030: 6F 75 67 68 20 74 6F 20 62 65 20 73 65 65 6E 20 ough to be seen 0x0040: 32 20 2D 20 4D 61 63 0D 64 61 74 61 20 6C 6F 6E 2 - Mac.data lon 0x0050: 67 20 65 6E 6F 75 67 68 20 74 6F 20 62 65 20 73 g enough to be s 0x0060: 65 65 6E 20 33 20 2D 20 4D 61 63 0D 64 61 74 61 een 3 - Mac.data 0x0070: 20 6C 6F 6E 67 20 65 6E 6F 75 67 68 20 74 6F 20 long enough to 0x0080: 62 65 20 73 65 65 6E 20 34 20 2D 20 4D 61 63 0D be seen 4 - Mac. 0x0090: 64 61 74 61 20 6C 6F 6E 67 20 65 6E 6F 75 67 68 data long enough 0x00A0: 20 74 6F 20 62 65 20 73 65 65 6E 20 35 20 2D 20 to be seen 5 - 0x00B0: 4D 61 63 0D Mac. 0x00B4: $

Lines starting with $ ta where the prompt overwrites the previous output (and the "long enough to be visible" part, because my hint is usually longer than just $ ).

odx is a hex dump program. ule is for “uniform line endings” and parses or converts data to have uniform line endings.

 Usage: ule [-cdhmnsuzV] [file ...] -c Check line endings (default) -d Convert to DOS (CRLF) line endings -h Print this help and exit -m Convert to MAC (CR) line endings -n Ensure line ending at end of file -s Write output to standard output (default) -u Convert to Unix (LF) line endings -z Check for zero (null) bytes -V Print version information and exit

Eric Tsui · Answer 2 · 2015-06-26T06:23:08+0000

What is an example code:

1) compress several "\ n" to one "\ n" in sequence

2) Get rid of the host "\ n" at the beginning, if any.

  input: '\n\n\naa\nbb\n\ncc' output: aa'\n' bb'\n' //notice, there is no blank line here cc

If that was the goal, then your code logic is right for it.

newLine = 1 defining newLine = 1 , it will get rid of any leading '\ n' input txt.
And when "\ n" remains after processing, it will print a new line to give a hint.

Back to the question itself , if the actual goal is to compress consecutive blank lines to just one blank line ( which needs two consecutive '\ n' , one to end the previous line, one for the empty line).

1) First confirm the input and the expected output,

Input text:

 aaa'\n' //1st line, there is a '\n' append to 'aaa' '\n' //2nd line, blank line bbb'\n' //3rd line, there is a '\n' append to 'bbb' '\n' //4th line, blank line '\n' //5th line, blank line '\n' //6th line, blank line ccc //7th line,

Expected output text:

 aaa'\n' //1st line, there is a '\n' append to 'aaa' '\n' //2nd line, blank line bbb'\n' //3rd line, there is a '\n' append to 'bbb' '\n' //4th line, blank line ccc //5th line,

2) If this is the exact goal of the program, as indicated above, then

 if (c == '\n') { newLine++; if (newLine < 3) // here should be 3 to print '\n' twice, // one for 'aaa\n', one for blank line { //printf("new line"); putchar(c); } }

3) If you need to process a Windows format file (with the completion of \r\n ) in Cygwin, you can do the following:

 while ((c = fgetc(fileContents)) != EOF) { if ( c == '\r') continue;// add this line to discard possible '\r' if (c == '\n') { newLine++; if (newLine < 3) //here should be 3 to print '\n' twice { printf("new line"); putchar(c); } } else { putchar(c); newLine = 0; } }

olivecoder · Answer 3 · 2015-06-26T06:29:13+0000

[Changed] Minimum change:

 if ( newLine <= 2)

forgive me and forget the previous code.

slightly simpler alternative:

 int c; int duplicates=0; while ((c = fgetc(fileContents)) != EOF) { if (c == '\n') { if (duplicates > 1) continue; duplicates++; } else { duplicates=0; } putchar(c); }

WedaPashi · Answer 4 · 2015-06-26T05:51:08+0000

Dry code: If the file starts with a newline and newLine is 1 :

For the first iteration:

 if (c == '\n') //Will be evaluated as true for a new-line character. { newLine++; //newLine becomes 2 before next if condition is evaluated. if (newLine < 2) //False, since newLine is not less than 2, but equal. { printf("new line"); putchar(c); } } else //Not entered { putchar(c); newLine = 0; }

At the second iteration: (suppose this is a sequential case of a new char string)

 if (c == '\n') //Will be evaluated as true for a new-line character. { newLine++; //newLine becomes 3 before next if condition is evaluated. if (newLine < 2) //False, since newLine is greater than 2. { printf("new line"); putchar(c); } } else //Not entered { putchar(c); newLine = 0; }

So,

Initialize newLine to 0 .

kingfrito_5005 · Answer 5 · 2015-06-26T18:14:38+0000

 if newline > 2

This should be greater than or equal if you want to get rid of the second line. In addition, you have a newline wrapping by one, and then incremented by two, and then reset to zero. Instead, I recommend replacing the counter with a logical one such as

 boolean firstNewlineFound = false

Then, when you find a new line, set it to true; whenever true, delete onenewline and return it to false.

Replacing multiple newlines in a file with one

Diagnostics

Working code - dupnl.c

Run example - inputs and outputs

More articles:

Working code - `dupnl.c`