How to count hits per hour from entries in a log file?

I have a log file in which each line contains an IP address, access time and URL. I want to count hits per hour.

Data access time is as follows

[01/Jan/2017:14:15:45 +1000] [01/Jan/2017:14:15:45 +1000] [01/Jan/2017:15:16:05 +1000] [01/Jan/2017:16:16:05 +1000] 

How can I improve it, so I don’t need to adjust the variable and the if statement for every hour?

 twoPM = 0 thrPM = 0 fouPM = 0 timeStamp = line.split('[')[1].split(']')[0] formated_timeStamp = datetime.datetime.strptime(timeStamp,'%d/%b/%Y:%H:%M:%S %z').strftime('%H') if formated_timeStamp == '14': twoPM +=1 if formated_timeStamp == '15': thrPM +=1 if formated_timeStamp == '16': fouPM +=1 
+5
source share
3 answers
  • You can include brackets in the strptime format strptime :

     datetime.datetime.strptime(line.strip(),'[%d/%b/%Y:%H:%M:%S %z]') 
  • You can extract the hour using the .hour attribute for any datetime.datetime object:

     timestamp = datetime.datetime.strptime(…) hour = timestamp.hour 
  • You can count the number of items using collections.Counter :

     from collections import Counter def read_logs(filename): with open(filename) as log_file: for line in log_file: timestamp = datetime.datetime.strptime( line.strip(), '[%d/%b/%Y:%H:%M:%S %z]') yield timestamp.hour def count_access(log_filename): return Counter(read_logs(log_filename)) if __name__ == '__main__': print(count_access('/path/to/logs/')) 
+3
source

You do not say whether access is daily per hour or what. So it can be a lot. But here is a simple version:

 import collections import io log_data = ''' [01/Jan/2017:14:15:45 +1000] [01/Jan/2017:14:15:45 +1000] [01/Jan/2017:15:16:05 +1000] [01/Jan/2017:16:16:05 +1000] ''' def filter_lines(file): for line in file: if line.startswith('['): yield line def extract_hour_from_line(seq): for line in seq: yield line.split(':')[1] def access_per_hour(file): aph = collections.Counter(extract_hour_from_line(filter_lines(file))) return aph if __name__ == '__main__': logfile = io.StringIO(log_data) aph = access_per_hour(logfile) print(aph) 

This uses StringIO to convert the strings you provided as an example into a β€œfile” in memory that can be read. You can simply open your log file, as you are, no doubt, right now, to manage it properly.

The collections.Counter class takes a sequence and creates a dictionary-like object, where the keys are elements from the sequence, and the values ​​are numbers β€” the number of times each of them appeared in the sequence.

This version of the code simply counts all the individual clock values, no matter what date the access takes place. That is, 12:00 on Tuesday and 12:00 on Wednesday are considered the same hours. This is useful if you are just plotting a bar chart of an hour.

If you want to do more complex grouping, you can try using the filter_lines function to limit which lines you want to see as a whole. For example, only strings between a date range or only strings that access a specific URL.

If you want to treat different days as different, you can use the extract_hour_from_line function to build a separate value - for example, combine the date and hour.

+2
source

You can use the dictionary:

 per_hour = {} per_hour[formated_timeStamp] += 1 

so you get something like

 {'0': 12, '1': 8, '2': 41, ...} 

where the key represents the hour.

+1
source

Source: https://habr.com/ru/post/1271093/


All Articles