Sort columns by number of rows in bash

Question

Sort columns by number of rows in bash

Suppose a text file contains x the number of columns in a row.

$cat file # where x=3 foo foo foo bar bar bar baz baz qux

Is there a way in bash to sort these columns by the number of text strings (i.e. filled lines) that they contain, keeping the internal order of the rows in each column?

 $sought_command file foo foo foo bar bar bar baz baz qux

Essentially, the column with the largest number of rows should be the first, the column with the second number of rows should be the second, etc.

(This task will be easily implemented using R , but I'm interested in learning about the solution through bash.)

EDIT 1 :

Below are some additional information. Each column contains at least one text row (that is, one filled row). Text strings can be any alphanumeric combination and have any length (but obviously do not contain spaces). Output columns must not contain empty rows. There is no a priori restriction for a column delimiter if it remains consistent across the table.

All that is needed for this task is moving the columns as they are sorted by the length of the column. (I know that implementing this in bash sounds easier than it really is.)

+6

string sorting bash awk multiple-columns

Michael Gruenstaeudl Nov 28 '16 at 18:36

source share

4 answers

Create a function called transposition, as this is the first:

 transpose() { awk -v FPAT='[^[:blank:]]+|[ \t]{3,}' '{ for (i=1; i<=NF; i++) a[i,NR]=$i max=(max<NF?NF:max) } END {for (i=1; i<=max; i++) for (j=1; j<=NR; j++) printf "%s%s", a[i,j], (j==NR?ORS:OFS) }' }

Then use it like:

 transpose < file | awk '{print NF "\t" $0}' | sort -k1nr | cut -f2- | transpose foo foo foo bar bar bar baz baz qux

Steps:

Transpose function call to transpose column in rows
Use awk to add # fields at the beginning of each line
Use sort in the reverse serial number of the first column
Use cut to get rid of the first column.
Call transpose again to transpose the column into rows to get the original order.

PS: Because of the use of FPAT , we need gnu-awk here.

+1

anubhava Nov 28 '16 at 19:15

source share

with unix toolkit

 $ tr '\t' '\n' <file | pr -4ts | awk '{print gsub(/-/,"-") "\t" $0}' | sort -k1n | cut -f2- | tr '\t' '\n' | pr -3ts foo foo foo bar bar bar baz baz - qux - -

assumes that the columns are divided into tabs and that missing values are represented using "-". Magic numbers 4 and 3 are the number of rows and columns, respectively.

Used as input file

 $ cat file foo foo foo bar bar bar - baz baz - qux -

+1

karakfa Nov 28 '16 at 19:19

source share

 sed -e 's/^ *//' columns.txt # => # foo foo foo # bar bar bar # baz baz # qux

I'll be here all week !: D

In a more serious note, you can wrap your columns with bash using awk or rs . This will make it easier to sort your columns (now rows) and wrap them back. However, a few spaces can create a problem for awk .

0

Eric Duminil Nov 28 '16 at 19:17

source share

Ed morton · Accepted Answer · 2016-11-28T19:30:54+0000

With GNU awk for sorted_in and assuming your columns are separated by tabs:

 $ cat tst.awk BEGIN{ FS=OFS="\t" } { for (i=1; i<=NF; i++) { if ($i ~ /[^[:space:]]/) { cell[NR,i] = $i cnt[i]++ } } next } END { PROCINFO["sorted_in"] = "@val_num_desc" for (row=1; row<=NR; row++) { c=0 for (col in cnt) { printf "%s%s", (c++?OFS:""), cell[row,col] } print "" } } $ awk -f tst.awk file foo foo foo bar bar bar baz baz qux

Sort columns by number of rows in bash

More articles: