Add Tab Separator to Grep

Question

Add Tab Separator to Grep

I am new to grep and awk, and I would like to create tabs separated by values in the "frequency.txt" file (this script looks at a large case and then displays each individual word and how many times it is used in the case - I modified it for Khmer language). I looked ( grep tab on UNIX ), but I can not find an example that makes sense to me for this bash script (I'm too many newbies).

I use this bash script in cygwin:

#!/bin/bash
# Create a tally of all the words in the corpus.
#
echo Creating tally of word frequencies...
#
sed -e 's/[a-zA-Z]//g' -e 's// /g' -e 's/\t/ /g' \
    -e 's/[«|»|:|;|.|,|(|)|-|?|។|"|"]//g' -e 's/[0-9]//g' \
    -e 's/ /\n/g' -e 's/០//g' -e 's/១//g' -e 's/២//g' \
    -e 's/៣//g' -e 's/៤//g' -e 's/៥//g' -e 's/៦//g' \
    -e 's/៧//g' -e 's/៨//g' -e 's/៩//g' dictionary.txt | \
  tr [:upper:] [:lower:] | \
  sort | \
  uniq -c | \
  sort -rn > frequency.txt
grep -Fwf dictionary.txt frequency.txt | awk '{print $2 "," $1}'

Awk prints with a comma, but it is only on the screen. How can I put a tab (a comma will also work), between frequency and term?

dictionary.txt(Khmer , , sed , ):

ព្រ ះ វិញ្ញាណ នឹង ប្រពន្ធ ថ្មោង ថ្មី ពោល ថា អញ្ជើញ មក ហើយ អ្នក ណា ដែល ឮ ក៏ ថា អញ្ជើញ មក ដែរ អ្នក ណា ដែល ស្រេក ន ោះ មាន តែ មក ហើយ អ្នក ណា ដែល ចង់ បាន មាន តែ យក ទឹក ជីវិត ន ោះ ច ុះ ឥត ចេញ ថ្លៃ ទេ.

frequency.txt, ( ):

25605 នឹង 25043 ជា 22004 បាន 20515 ន ោះ

, frequency.txt ( TAB ):

25605TAB នឹង 25043TAB ជា 22004TAB បាន 20515TAB ន ោះ

!

+3

bash grep awk cygwin

Nathan 01 . '11 0:06

3

awk "<"?

+1

Ratinho 01 . '11 0:16

The following script should get you where you need to go. The trumpet up teeallows you to see the output on the screen, while recording the output on./outfile

#!/bin/sh  

sed ':a;N;s/[a-zA-Z0-9។០១២៣៤៥៦៧៨៩\n«»:;.,()?""-]//g;ta' < dictionary.txt | \
gawk '{$0=toupper($0);for(i=1;i<=NF;i++)a[$i]++}
   END{for(item in a)printf "%s\t%d ", item, a[item]}' | \
tee ./outfile

+1

Siegex Feb 01 '11 at 1:25

source share

Dennis Williamson · Accepted Answer · 2011-02-01T01:06:44+0000

sed :

tr -d '[a-zA-Z][0-9]«»:;.,()-?។""|០១២៣៤៥៦៧៨៩'
tr '\t' ' '

:

's// /g' - , [a-z][A-Z], , , -op
's/[«|»|:|;|.|,|(|)|-|?|។|"|"]//g' - , ( ), 's/[«»:;.,()-?។""|]//g' ( , )
's/ /\n/g' - ,

, uniq:

sed 's/^ *\([0-9]\+\) /\1\t/'

, AWK :

awk 'BEGIN{OFS='\t'} {print $2, $1}'

Add Tab Separator to Grep

More articles: