Sort columns by number of rows in bash

Suppose a text file contains x the number of columns in a row.

$cat file # where x=3 foo foo foo bar bar bar baz baz qux 

Is there a way in bash to sort these columns by the number of text strings (i.e. filled lines) that they contain, keeping the internal order of the rows in each column?

 $sought_command file foo foo foo bar bar bar baz baz qux 

Essentially, the column with the largest number of rows should be the first, the column with the second number of rows should be the second, etc.

(This task will be easily implemented using R , but I'm interested in learning about the solution through bash.)

EDIT 1 :

Below are some additional information. Each column contains at least one text row (that is, one filled row). Text strings can be any alphanumeric combination and have any length (but obviously do not contain spaces). Output columns must not contain empty rows. There is no a priori restriction for a column delimiter if it remains consistent across the table.

All that is needed for this task is moving the columns as they are sorted by the length of the column. (I know that implementing this in bash sounds easier than it really is.)

+6
source share
4 answers

With GNU awk for sorted_in and assuming your columns are separated by tabs:

 $ cat tst.awk BEGIN{ FS=OFS="\t" } { for (i=1; i<=NF; i++) { if ($i ~ /[^[:space:]]/) { cell[NR,i] = $i cnt[i]++ } } next } END { PROCINFO["sorted_in"] = "@val_num_desc" for (row=1; row<=NR; row++) { c=0 for (col in cnt) { printf "%s%s", (c++?OFS:""), cell[row,col] } print "" } } $ awk -f tst.awk file foo foo foo bar bar bar baz baz qux 
+4
source

Create a function called transposition, as this is the first:

 transpose() { awk -v FPAT='[^[:blank:]]+|[ \t]{3,}' '{ for (i=1; i<=NF; i++) a[i,NR]=$i max=(max<NF?NF:max) } END {for (i=1; i<=max; i++) for (j=1; j<=NR; j++) printf "%s%s", a[i,j], (j==NR?ORS:OFS) }' } 

Then use it like:

 transpose < file | awk '{print NF "\t" $0}' | sort -k1nr | cut -f2- | transpose foo foo foo bar bar bar baz baz qux 

Steps:

  • Transpose function call to transpose column in rows
  • Use awk to add # fields at the beginning of each line
  • Use sort in the reverse serial number of the first column
  • Use cut to get rid of the first column.
  • Call transpose again to transpose the column into rows to get the original order.

PS: Because of the use of FPAT , we need gnu-awk here.

+1
source

with unix toolkit

 $ tr '\t' '\n' <file | pr -4ts | awk '{print gsub(/-/,"-") "\t" $0}' | sort -k1n | cut -f2- | tr '\t' '\n' | pr -3ts foo foo foo bar bar bar baz baz - qux - - 

assumes that the columns are divided into tabs and that missing values ​​are represented using "-". Magic numbers 4 and 3 are the number of rows and columns, respectively.

Used as input file

 $ cat file foo foo foo bar bar bar - baz baz - qux - 
+1
source
 sed -e 's/^ *//' columns.txt # => # foo foo foo # bar bar bar # baz baz # qux 

I'll be here all week !: D

In a more serious note, you can wrap your columns with bash using awk or rs . This will make it easier to sort your columns (now rows) and wrap them back. However, a few spaces can create a problem for awk .

0
source

Source: https://habr.com/ru/post/1012705/


All Articles