Bash script to remove redundant lines

Good afternoon,

I am trying to create a bash script that cleans data output files. Files are as follows:

/path/ /path/to /path/to/keep /another/ /another/path/ /another/path/to /another/path/to/keep 

I would like to end up with:

 /path/to/keep /another/path/to/keep 

I want to cycle through the lines of a file, checking the next line to see if it contains the current line, and if so, delete the current line from the file. Here is my code:

 for LINE in $(cat bbutters_data2.txt) do grep -A1 ${LINE} bbutters_data2.txt if [ $? -eq 0 ] then sed -i '/${LINE}/d' ./bbutters_data2.txt fi done 
+6
source share
2 answers

Assuming your input file is sorted by how you showed:

 $ awk 'NR>1 && substr($0,1,length(last))!=last {print last;} {last=$0;} END{print last}' file /path/to/keep /another/path/to/keep 

How it works

awk reads the input file line by line. Each time we read a new line, we compare it with the last. If the new line does not contain the last line, we print the last line. More details:

  • NR>1 && substr($0,1,length(last))!=last {print last;}

    If this is not the first line, and if the last line, called last , is not contained in the current line, $0 , then print the last line.

  • last=$0

    Update the last variable to the current line.

  • END{print last}

    After reading the file, print the last line.

+3
source

I like the awk solution, but bash itself can handle the task. Note. the solution (both awk and bash ) requires that the smaller included paths be specified in ascending order. Here is an alternative solution to bash (bash just because of the glob matching operation):

 #!/bin/bash fn="${1:-/dev/stdin}" ## accept filename or stdin [ -r "$fn" ] || { ## validate file is readable printf "error: file not found: '%s'\n" "$fn" exit 1 } declare -i cnt=0 ## flag for 1st iteration while read -r line; do ## for each line in file ## if 1st iteration, fill 'last', increment 'cnt', continue [ $cnt -eq 0 ] && { last="$line"; ((cnt++)); continue; } ## while 'line' is a child of 'last', continue, else print [[ $line = "${last%/}"/* ]] || printf "%s\n" "$last" last="$line" ## update last=$line done <"$fn" [ ${#line} -eq 0 ] && ## print last line (updated for non POSIX line end) printf "%s\n" "$last" || printf "%s\n" "$line" exit 0 

Output

 $ bash path_uniql.sh < dat/incpaths.txt /path/to/keep /another/path/to/keep 
-2
source

Source: https://habr.com/ru/post/986838/


All Articles