simple awk script
awk -F'\t' '{OFS="\t"; if ($4=="" || $4!=old) print; old=$4}' input.txt
Result
chr1 12226559 12227059 TNFRSF1B chr1 17051560 17052060 chr1 17053279 17053779 chr1 17338423 17338923 ATP13A2 chr1 19577574 19578074 EMC1 MRTO4 chr1 19578046 19578546 EMC1 MRTO4 chr1 19638239 19638739 AKR7A2 PQLC2
cleaning
To prepare the input.txt file, I copied the text from the question. But I had to replace the spaces with tabs. Therefore, I used the sed command. I also noticed some trailing spaces (at the end of the line). Finally, I used the following sed command to clean up the imput file:
sed 's/ *$//;/^[^ ]/s/ */\t/g;/^ /s/ */\t\t\t/g;' copy-fron-so.txt > input.txt
input file from @dogbane comment
chr1 12226559 12227059 TNFRSF1B chr1 17051560 17052060 chr1 17053279 17053779 chr1 17338423 17338923 ATP13A2 ATP13A2 ATP13A2 chr1 19577574 19578074 EMC1 MRTO4 chr1 19578046 19578546 EMC1 MRTO4 chr1 19638239 19638739 AKR7A2 PQLC2 PQLC2 PQLC2 AKR7A2
(last line added)
cleaning and treatment
$> sed 's/ *$//;/^[^ ]/s/ */\t/g;/^ /s/ */\t\t\t/g;' copypaste.txt > input.txt $> awk -F'\t' '{OFS="\t"; if ($4=="" || $4!=old) print; old=$4}' input.txt chr1 12226559 12227059 TNFRSF1B chr1 17051560 17052060 chr1 17053279 17053779 chr1 17338423 17338923 ATP13A2 chr1 19577574 19578074 EMC1 MRTO4 chr1 19578046 19578546 EMC1 MRTO4 chr1 19638239 19638739 AKR7A2 PQLC2 AKR7A2
change of requirements
The last line with AKR7A2 should not be printed. Therefore, we need to sort the input.txt file input.txt . Caution, the -t option is for entering a tab, on bash or vi press [CTRL-V] , then [TAB] (put quotation marks around this tab).
$> LANG=C sort -k 4 -s -t ' ' input.txt > sorted.txt $> awk -F'\t' '{OFS="\t"; if ($4=="" || $4!=old) print; old=$4}' sorted.txt chr1 17051560 17052060 chr1 17053279 17053779 chr1 19638239 19638739 AKR7A2 chr1 17338423 17338923 ATP13A2 chr1 19577574 19578074 EMC1 MRTO4 PQLC2 chr1 12226559 12227059 TNFRSF1B
Note that there is now a single line ending in MRTO4 !
source share