Merge multiple rows into one row by column value

Question

Merge multiple rows into one row by column value

I have a tab delimited text file that is very large. Many lines in the file have the same value for one of the columns in the file. I want to put them on one line. For instance:

a foo a bar a foo2 b bar c bar2

After running the script, it should look like this:

 a foo;bar;foo2 b bar c bar2

how to do this in a shell script or in python?

thanks.

+6

python split perl

Jianguo Jun 15 '11 at 14:54

source share

6 answers

 from collections import defaultdict items = defaultdict(list) for line in open('sourcefile'): key, val = line.split('\t') items[key].append(val) result = open('result', 'w') for k in sorted(items): result.write('%s\t%s\n' % (k, ';'.join(items[k]))) result.close()

not verified

+2

dugres Jun 15 '11 at 15:06

source share

Tested with Python 2.7:

 import csv data = {} reader = csv.DictReader(open('infile','r'),fieldnames=['key','value'],delimiter='\t') for row in reader: if row['key'] in data: data[row['key']].append(row['value']) else: data[row['key']] = [row['value']] writer = open('outfile','w') for key in data: writer.write(key + '\t' + ';'.join(data[key]) + '\n') writer.close()

+1

Scott Jun 15 '11 at 15:53

source share

 def compress(infilepath, outfilepath): input = open(infilepath, 'r') output = open(outfilepath, 'w') prev_index = None for line in input: index, val = line.split('\t') if index == prev_index: output.write(";%s" %val) else: output.write("\n%s %s" %(index, val)) input.close() output.close()

Unconfirmed, but should work. Please leave a comment if there is any problem.

0

inspectorG4dget Jun 15 '11 at 15:00

source share

Perl's way to do this:

 #!/usr/bin/perl use strict; use warnings; use Data::Dumper; open my $fh, '<', 'path/to/file' or die "unable to open file:$!"; my %res; while(<$fh>) { my ($k, $v) = split; push @{$res{$k}}, $v; } print Dumper \%res;

exit:

 $VAR1 = { 'c' => [ 'bar2' ], 'a' => [ 'foo', 'bar', 'foo2' ], 'b' => [ 'bar' ] };

0

Toto Jun 15 '11 at 15:20

source share

 #! /usr/bin/env perl use strict; use warnings; # for demo only *ARGV = *DATA; my %record; my @order; while (<>) { chomp; my($key,$combine) = split; push @order, $key unless exists $record{$key}; push @{ $record{$key} }, $combine; } print $_, "\t", join(";", @{ $record{$_} }), "\n" for @order; __DATA__ a foo a bar a foo2 b bar c bar2

Exit (with tabs converted to spaces because Stack interrupts output):

  a foo; bar; foo2
 b bar
 c bar2

0

Greg bacon Jun 15 '11 at 15:52

source share

Sai · Accepted Answer · 2011-06-15T15:07:55+0000

With awk you can try this

 { a[$1] = a[$1] ";" $2 } END { for (item in a ) print item, a[item] }

So, if you save this awk script in awkf.awk file, and if your input file isile.txt, run the script

 awk -f awkf.awk ifile.txt | sed 's/ ;/ /'

The sed script command is to remove the lead;

Hope this helps

Merge multiple rows into one row by column value

More articles: