How to convert the end of Windows to the end of a Unix line (CR / LF to LF)

I am a Java developer and I use Ubuntu for development. The project was created on Windows with Eclipse and uses CP1252 encoding.

To convert to UTF-8, I used the transcoding program:

find Web -iname \*.java | xargs recode CP1252...UTF-8 

this command gives this error:

 recode: Web/src/br/cits/projeto/geral/presentation/GravacaoMessageHelper.java failed: Ambiguous output in step `CR-LF..data 

I have a tip about this and get the solution here: http://fvue.nl/wiki/Bash_and_Windows#Recode:_Ambiguous_output_in_step_.60data..CR-LF.27 and it says:

Convert line endings from CR / LF to single LF: edit the file with vim, enter the command: set ff = unix and save the file. Recoding should now be performed without errors.

It's nice, but I have a lot of files to delete the CR / LF character, I can not open them to do this. Vi does not provide any command line options for bash operations.

can sed be used for this? How?

Thankx =)

+57
linux windows end-of-line
Oct 08 2018-10-10
source share
8 answers

There should be a program called dos2unix that will capture the end of lines for you. If it is not already in your Linux window, it should be accessible through the package manager.

+98
08 Oct 2018-10-10
source share

sed cannot match \ n because the trailing newline is deleted before the string is placed in the pattern space, but it can match \ r, so you can convert \ r \ n (dos) to \ n (unix) by removing \ r

 sed -i 's/\r//g' file 

Warning: this will change the source file

However, you cannot change this from unix EOL to dos or old mac (\ r). More readings here:

How to replace newline (\ n) with sed?

+67
Oct 09 '13 at 21:51
source share

Actually, vim really allows what you are looking for. Type vim and enter the following commands:

 :args **/*.java :argdo set ff=unix | update | next 

The first of these commands sets the argument list for each file matching **/*.java , which is all Java files, recursively. The second of these commands does the following for each file in the argument list, in turn:

  • Sets Unix-style line outlines (you already know this)
  • Writes a file if it has been modified
  • Go to the next file
+14
Aug 19 '14 at
source share

The tr command can also do this:

tr -d '\ 15 \ 32' <winfile.txt> unixfile.txt

and should be available to you.

You need to run tr from the script, since it cannot work with file names. For example, create a myscript.sh file:

 #!/bin/bash cd ${1} for f in `find -iname \*.java`; do echo $f tr -d '\15\32' < $f > $f.tr mv $f.tr $f recode CP1252...UTF-8 $f done 

Running myscript.sh Web will process all java files in the Web folder.

+8
Oct 08 2018-10-10
source share

To overcome

 Ambiguous output in step `CR-LF..data' 

A simple solution might be to add the -f flag to force the conversion.

+5
May 16 '12 at
source share

I will make a small exception for answering jichao. You can do everything that he just talked about quite easily. Instead of searching for \ n, just find the form feed at the end of the line.

 sed -i 's/\r$//' ${FILE_NAME} 

To move from unix back to dos, simply find the last character in the line and add the form feed to it. (I will add -r to make this easier with grep regular expressions.)

 sed -ri 's/(.)$/\1\r/' ${FILE_NAME} 

Theoretically, the file can be changed to mac style by adding code to the last example, which also adds the next line of input to the first line until all lines are processed. However, I will not try to make this example.

Warning: -i modifies the actual file. If you want to back up, add a character string after -i. This will move the existing file to a file with the same name with the characters you added.

+3
May 26 '17 at 20:51
source share

Have you tried the python script from Brian Maupin found here ? (I modified it a bit to be more general)

 #!/usr/bin/env python import sys input_file_name = sys.argv[1] output_file_name = sys.argv[2] input_file = open(input_file_name) output_file = open(output_file_name, 'w') line_number = 0 for input_line in input_file: line_number += 1 try: # first try to decode it using cp1252 (Windows, Western Europe) output_line = input_line.decode('cp1252').encode('utf8') except UnicodeDecodeError, error: # if there an error sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr try: # then if that fails, try to decode using latin1 (ISO 8859-1) output_line = input_line.decode('latin1').encode('utf8') except UnicodeDecodeError, error: # if there an error sys.stderr.write('ERROR (line %s):\t%s\n' % (line_number, error)) # write to stderr sys.exit(1) # and just keep going output_file.write(output_line) input_file.close() output_file.close() 

You can use this script with

 $ ./cp1252_utf8.py file_cp1252.sql file_utf8.sql 
0
Dec 08 2018-10-10
source share

Go back to Windows, tell Eclipse about the encoding change in UTF-8, then go back to Unix and run d2u in the files.

-one
Oct 08 2018-10-10
source share



All Articles