Does grep delete lines that look like semps?

Question

Does grep delete lines that look like semps?

I am reading a file like this:

cat access_logs | grep Ruby

To determine which IP addresses are accessing one of my files. He returns a huge list. I want to remove semi-duplicates, i.e. These two lines are technically the same, with the exception of different time and date stamps. In a massive list with thousands of retries - is there a way to get unique IP addresses?

1.2.3.4 - - [13/Apr/2014:14:20:17 -0400] "GET /color.txt HTTP/1.1" 404 207 "-" "Ruby"
1.2.3.4 - - [13/Apr/2014:14:20:38 -0400] "GET /color.txt HTTP/1.1" 404 207 "-" "Ruby"
1.2.3.4 - - [13/Apr/2014:15:20:17 -0400] "GET /color.txt HTTP/1.1" 404 207 "-" "Ruby"
1.2.3.4 - - [13/Apr/2014:15:20:38 -0400] "GET /color.txt HTTP/1.1" 404 207 "-" "Ruby"

So, for example, will these 4 lines be trimmed into only one line?

+4

linux unix regex

Jason Apr 13 '14 at 18:34

source share

2 answers

You can do:

awk '/Ruby/{print $1}' file | sort -u

grep + cut, , .

+3

P.P. 13 . '14 18:39

anubhava · Accepted Answer · 2014-04-13T18:40:08+0000

You can use awk:

awk '/Ruby/ && !seen[$1]++' access_logs

This will print only the first line for each IP address, even if the timestamp is different for that IP address.

To enter, enter:

1.2.3.4 - - [13/Apr/2014:14:20:17 -0400] "GET /color.txt HTTP/1.1" 404 207 "-" "Ruby"

Does grep delete lines that look like semps?

More articles: