File with pipe separators with empty entries; convert to tab delimiter with "<blank>" between
Problem
I was provided with a line-delimited text file containing the file names and some indexed information from each file. My goal is to make this tab delimited file. However , I want to know where the blank entries are. This will be done, for example. with lorem||dolorbecomes lorem '\t' <empty> '\t' dolor.
Let me give you a couple more examples of what I have been given and what you need:
Example with multiple lines: (NB Each line has the same number of entries.)
Given:
||dolor|sit
amet,||adipiscing|
sed|do|eiusmod|tempor
Desired:
<empty> '\t' <empty> '\t' dolor '\t' sit '\n'
amet, '\t' <empty> '\t' adipiscing '\t' <empty> '\n'
sed '\t' do '\t' eiusmod '\t' tempor '\n'
Blank entries at the beginning and end.
Given:
|ut|labore||dolore||
Desired:
<empty> '\t' ut '\t' labore '/t' <empty> '\t' dolore '\t' <empty> '\t' <empty>
(I don't need spaces, I just thought it would be easier to read the format I want.)
. , , 1 36 ( 0 37 .)
sed, awk, grep, tr .. , . A perl python script ( , ) .
, . sed ( ): ref. ; (, , <empty> ): ref. ; : ref ;
$ uname -a
CYGWIN_NT-10.0 A-1052207 2.5.2(0.297/5/3) 2016-06-23 14:29 x86_64 Cygwin
$ bash --version
GNU bash, version 4.3.42(4)-release (x86_64-unknown-cygwin) ...
$
Cygwin Windows 10 ( .)
Edit1
, .
, , :
( , , , enter, , enter .. /, > , Enter .)
$ cat > myfile.txt<<EOF
> ||foo|||bar||
> EOF
$ <**command-to-be-used**> myfile.txt | cat -A
<empty>^I<empty>^Ifoo^I<empty>^I<empty>^Ibar^I<empty>^I<empty>$
^I bash a '\t'. , , , , <empty> , labore (. ). , ( @Neil_McGuigan @Ed_Morton) '\t' labore, <empty>. , . .
@Neil_McGuigan. : " ", , \ .
$ echo "||lorem|ipsum||sit|amet,||||eiusmod|tempor|||labore|" |
awk '
{
$1=$1; n_empty=0;
for(i=1; i<=NF; i++)
{
if($i=="") {$i="<empty>"; n_empty++;}
};
print
}
END {print n_empty" entries are empty" | "cat 1>&2";}
' FS='|' OFS=$'\t'
| cat -A
:
<empty>^I<empty>^Ilorem^Iipsum^I<empty>^Isit^Iamet,^I<empty>^I<empty>^I<empty>^Ieiusmod^Itempor^I<empty>^I<empty>^Ilabore^I<empty>$
9 entries are empty
, , , :
<empty>^I<empty>^Ilorem^Iipsum^I<empty>^Isit^Iamet,^I<empty>^I<empty>^I<empty>^Ieiusmod^Itempor^I<empty>^I<empty>^Ilabore^I<empty>$
9 entries are empty
( , , stderr, , .)
, , .
@Neil_McGuigan @Ed_Morton, , . :
$ awk '{$1=$1; n_empty=0; for(i=1; i<=NF; i++) {if($i=="") {$i="<empty>"; n_empty++;}}; print;} END {print n_empty" entries are empty" | "cat 1>&2";}' FS='|' OFS=$'\t' file_pipe-delim.txt > file_tab-delim.txt
$
, , :
$ awk '{$1=$1; for(i=1; i<NF; i++){ if($(i)=="")$(i)="<empty>" }; print}'
FS='|' OFS=$'\t' file_pipe-delim.txt | sed 's/\t$/\t<empty>/g' >
file_tab-delim.txt
$
, , :
( , , , enter, , enter .. /, > , Enter .)
$ cat > file_pipe-delim.txt<<EOF
> ||dolor|sit
> amet,||adipiscing|
> sed|do|eiusmod|tempor
> |||
> |aliqua.|Ut|
> EOF
$ awk '{$1=$1; n_empty=0; for(i=1; i<=NF; i++)
{if($i=="") {$i="<empty>"; n_empty++;}}; print;} END
{print n_empty" entries are empty" | "cat 1>&2";}'
FS='|' OFS=$'\t' file_pipe-delim.txt > file_tab-delim.txt
$ cat -A file_tab-delim.txt
<empty>^I<empty>^Idolor^Isit$
amet,^I<empty>^Iadipiscing^I<empty>$
sed^Ido^Ieiusmod^Itempor$
<empty>^I<empty>^I<empty>^I<empty>$
<empty>^Ialiqua.^IUt^I<empty>$
$
, , . :
$ echo "||lorem|ipsum||sit|amet,||||eiusmod|tempor|||labore|" | awk '{$1=$1; n_empty=0; for(i=1; i<=NF; i++) {if($i=="") {$i="<empty>"; n_empty++;}}; print;} END {print n_empty" entries are empty" | "cat 1>&2";}' FS='|' OFS=$'\t' | cat -A
<empty>^I<empty>^Ilorem^Iipsum^I<empty>^Isit^Iamet,^I<empty>^I<empty>^I<empty>^Ieiusmod^Itempor^I<empty>^I<empty>^Ilabore^I<empty>$
9 entries are empty
cat -A, , ^I '\t'; , " ".
$ echo "||lorem|ipsum||sit|amet,||||eiusmod|tempor|||labore|" | \
awk '{$1=$1; n_empty=0; for(i=1; i<=NF; i++) \
{if($i=="") {$i="<empty>"; n_empty++;}}; print;} END \
{print n_empty" entries are empty" | "cat 1>&2";}' \
FS='|' OFS=$'\t'
<empty> <empty> lorem ipsum <empty> sit amet, <empty> <empty> <empty>eiusmod tempor <empty> <empty> labore <empty>
9 entries are empty
|| β |<empty>| , , :
$ sed 's/||/|<empty>|/g; s/||/|<empty>|/g; s/|/\t/g' file
lorem ipsum <empty> sit amet, <empty> <empty> <empty> eiusmod tempor <empty> <empty> labore
awk:
$ awk '{while(gsub(/\|\|/,"|<empty>|")); gsub(/\|/,"\t")} 1' file
lorem ipsum <empty> sit amet, <empty> <empty> <empty> eiusmod tempor <empty> <empty> labore
'$'\t'' \t.