Need to parse the log file in bash

I have a log file that contains a lot of text, some of them are useless. There are several lines in this journal that are important to me. Template for these lines:

0x00000001 (NEEDED) Shared library: [libm.so.6] 0x00000001 (NEEDED) Shared library: [libc.so.6] 0x00000001 (NEEDED) Shared library: [ld.so.1] 0x00000001 (NEEDED) Shared library: [libgcc_s.so.1] 

The NEEDED keyword can be found on all lines that are important to me. The keyword between [] is important to me. I need to create a list of all these lines without repeating them.

I did this in Python, but there seems to be no Python on the machine I want to run the script, so I need to rework the script in bash. I know only the basic things in bash, and I cannot find a solution to my problem.

Python script I used:

 import sys import re def testForKeyword(keyword, line): findStuff = re.compile(r"\b%s\b" % keyword, \ flags=re.IGNORECASE) if findStuff.search(line): return True else: return False # Get filename argument if len(sys.argv) != 2: print("USAGE: python libraryParser.py <log_file.log>") sys.exit(-1) file = open(sys.argv[1], "r") sharedLibraries = [] for line in file: if testForKeyword("NEEDED", line): libraryNameStart = line.find("[") + 1 libraryNameFinish = line.find("]") libraryName = line[libraryNameStart:libraryNameFinish] # No duplicates, only add if it does not exist try: sharedLibraries.index(libraryName) except ValueError: sharedLibraries.append(libraryName) for library in sharedLibraries: print(library) 

Could you help me solve this problem? Thanks in advance.

+4
source share
7 answers

One way to use awk under the condition of infile with question data:

 awk ' $2 ~ /NEEDED/ { lib = substr( $NF, 2, length($NF) - 2 ); libs[ lib ] = 1; } END { for (lib in libs) { printf "%s\n", lib; } } ' infile 

Output:

 libc.so.6 libgcc_s.so.1 ld.so.1 libm.so.6 
+3
source
 $ awk -F'[][]' '/NEEDED/ {print $2}' data.txt | sort | uniq ld.so.1 libc.so.6 libgcc_s.so.1 libm.so.6 

awk:

 $ awk -F'[][]' '/NEEDED/ {save[$5]++}END{ for (i in save) print i}' data.txt libc.so.6 libm.so.6 libgcc_s.so.1 ld.so.1 

Simplifying your Python code:

 #!/usr/bin/env python libs = [] with open("data.txt") as fd: for line in fd: if "NEEDED" in line: libs.append(line.split()[4]) for i in set(libs): print i 

Bash solution (without unique libraries)

 #!/bin/bash while IFS='][' read -a array do echo ${array[1]} done < data.txt 
+6
source

With grep and coreutils :

 grep NEEDED infile | grep -o '\[[^]]*\]' | tr -d '][' | sort | uniq 

Output:

 ld.so.1 libc.so.6 libgcc_s.so.1 libm.so.6 
+3
source

awk -F '[' ' /NEEDED/ { print $NF } ' file_name | sed 's/]//' | sort | uniq

+3
source
  awk '/NEEDED/ {gsub("[][]", ""); print $5}' < /tmp/1.txt | sort -u 
+1
source

If you have logs in a file called "log.txt", you can get it:

 grep "(NEEDED)" log.txt | awk -F"\[" '{print substr($2,0,length($2));}' - | sort -u 

Using sort -u you will not get duplicate lines.

+1
source

sed solution could be:

 sed -e '/(needed)/!d' -e 's/\(.*\[\)\|\(\]$\)//g' INPUTFILE 

Please note that if you are on Windows, the following is correct:

 sed -e '/(needed)/!d' -e 's/\(.*\[\)\|\(\].$\)//g' INPUTFILE 
  • the first part of -e removes every line that does not match (needed)
  • the second deletes everything to the last [ and last ] (and in the windows \r (carriage return) to \n , but this is not a problem, since the output is printed correctly ...
+1
source

Source: https://habr.com/ru/post/1436383/


All Articles