Merge two columns of a text file on Linux

Question

Merge two columns of a text file on Linux

I have a text file with several columns of text and values. This structure:

CAR 38 DOG 42 CAT 89 CAR 23 APE 18

If column 1 has a String, column 2 is not (or is it actually a String). And vice versa: if column 1 is empty, column 2 has a row. In other words, an “object” (CAR, CAT, DOG, etc.) occurs either in column 1 or in column 2, but not in both cases.

I'm looking for an efficient way to consolidate columns 1 and 2 so that the file looks like this instead:

 CAR 38 DOG 42 CAT 89 CAR 23 APE 18

I can do this in a Bash script using while and if, but I'm sure there is an easier way to do this. Can anyone help?

Hooray! Z

+6

linux bash

Zooma Apr 9 '15 at 19:51

source share

2 answers

Note. If:

You are looking for output with auto-sized columns with a fixed width on the left edge (the longest field value determines the width, with shorter values obtained to the right of spaces)
and are happy with the two spaces as a column separator
and use files small enough to be read into memory in general,

use Cyrus easier; column response. .

See below how the column based approach is compared to the awk based approach below in terms of performance and resource consumption.

awk is your friend here:

 awk -v OFS=' ' '{ print $1, $2 }' file

awk separates lines by field by default, so with your input, lines like CAR 38 and DOG 42 are parsed the same way ( CAR and DOG become fields 1, $1 , and 38 and 42 become fields 2, $2 ).
-v OFS=' ' sets the separator of the output field to two spaces (by default - one space); note that to produce aligned output there will be no filling of the output values.

To create a aligned output with fields of varying widths, use the Awk printf function, which gives you more control over the output; for example, the following outputs: 1st column 10-char along the entire left edge and 2nd char - the general right-aligned second column:

 awk '{ printf "%-10s %2s\n", $1, $2 }' file

Note that column widths must be known in advance.
In contrast, column -t conveniently automatically determines the width of the column by first analyzing all the data, but has consequences for performance and resource consumption; see below.

Performance / resource comparison between column -t and Awk:

column -t it is necessary to analyze all the input data in front, in the first pass, in order to be able to determine the maximum width of the input columns; from what I can say, he does this by first reading the input as a whole into memory, which can be problematic with large input files.
In contrast, the Awk solution reads rows one by one, but relies on knowing the width of the columns ahead of time.

In this way,

column -t will consume memory in proportion to the size of the input, while awk will use a constant amount of memory.
column -t usually slower, depending on the Awk implementation used; mawk much faster, gawk slightly faster, BSD awk slower (!); results based on 10 million line input file; commands are executed on OSX 10.10.2 and Ubuntu 14.04.

+8

mklement0 Apr 9 '15 at 19:54

source share

Cyrus · Accepted Answer · 2015-04-09T19:54:21+0000

Try the following:

 column -t file

Output:

  CAR 38
 DOG 42
 CAT 89
 CAR 23
 APE 18

Merge two columns of a text file on Linux

More articles: