Perl Analysis Apache

Question

Perl Analysis Apache

UPDATED 5-10-2013

Ok, now I can filter IP addresses without any problems. Now the following three things come in that I would like to do, which I thought could be easily done with sort($keys) , but I was wrong, and then tried a slightly more complex approach below, it seems there wasn’t either decision. The next thing I need to do is collect the dates and browser version. I will give a sample formatting of my log files and my current code.

APACHE LOG

 24.235.131.196 - - [10/Mar/2004:00:57:48 -0500] "GET http://www.google.com/iframe.php HTTP/1.0" 500 414 "http://www.google.com/iframe.php" "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)"

My code

 #!usr/bin/perl -w use strict; my %seen = (); open(FILE, "< access_log") or die "unable to open file $!"; while( my $line = <FILE>) { chomp $line; # regex for ip address. if( $line =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ ) { $seen{$1}++; } #regex for date an example is [09\Mar\2009:05:30:23] if( $line =~ /\[[\d]{2}\\.*[\d]{4}\:[\d]{2}\:[\d]{2}\]*/) { print "\n\n $line matched : $_\n"; } } close FILE; my $i = 0; # program bugs out if I uncomment the below line, # but to my understanding this is essentially what I'm trying to do. # for my $key ( keys %seen ) (keys %date) { for my $key ( keys %seen ) { my ($ip) = sort {$a cmp $b}($key); # also I'd like to be able to sort the IP addresses and if # I do it the proper numeric way it generates errors saying contents are not numeric. print @$ip->[$i] . "\n"; # print "The IPv4 address is : $key and has accessed the server $seen{$key} times. \n"; $i++; }

+4

string logging perl parsing apache

user1739860 May 10, '13 at 5:28

source share

1 answer

chrsblck · Accepted Answer · 2013-05-10T05:41:57+0000

You are pretty close. And yes, I would use hash . It is usually called the "seen hash."

 #!usr/bin/perl use warnings; use strict; my $log = "web.log"; my %seen = (); open (my $fh, "<", $log) or die "unable to open $log: $!"; while( my $line = <$fh> ) { chomp $line; if( $line =~ /(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/ ){ $seen{$1}++; } } close $fh; for my $key ( keys %seen ) { print "$key: $seen{$key}\n"; }

Here is an example log file with some output:

 $ cat web.log [Mon Sep 21 02:35:24 1999] some msg blah blah [Mon Sep 21 02:35:24 1999] 192.1.1.1 [Mon Sep 21 02:35:24 1999] 1.1.1.1 [Mon Sep 21 02:35:24 1999] 10.1.1.9 [Mon Sep 21 02:35:24 1999] 192.1.1.1 [Mon Sep 21 02:35:24 1999] 10.1.1.5 [Mon Sep 21 02:35:24 1999] 10.1.1.9 [Mon Sep 21 02:35:24 1999] 192.1.1.1 $ test.pl 1.1.1.1: 1 192.1.1.1: 3 10.1.1.9: 2 10.1.1.5: 1

A few things I could think of:

my @array = <FH>; this will pull the entire file into memory, which is not a great idea. Especially in this case, for log files, they can grow quite large. Especially if not rotated correctly. for or foreach will have the same problem. while is best practice for reading from a file.

You should be in the habit of using 3-arg lexically scoped open , as in my example above.

Your die statement doesn't have to be so "exact". See My message for die . Since the reason may be permissions, it does not exist, it is blocked, etc.

UPDATE

This will work for your dates.

 my $line = '[09\Mar\2009:05:30:23]: plus some message'; #example is [09\Mar\2009:05:30:23] if( $line =~ /(\[[\d]{2}\\.*\\[\d]{4}:[\d]{2}:[\d]{2}:[\d]{2}\])/ ){ print "$line matched: $1\n"; }

UPDATE2

There are a few things you did wrong.

I do not see you storing material on a hash date.

 print "\n\n $line matched : $_\n";

It should look like your seen hash , which doesn't make much sense. What are you trying to do with the stored date data?

 $data{$1} = "some value, which is up to you";

You cannot hashes over two hashes in one for loop.

 for my $foo (keys %h)(keys %h2) { # do stuff }

And for the last bit of sorting you should just sort keys

 for my $key (sort keys %seen ) {

Perl Analysis Apache

More articles: