You do not say whether access is daily per hour or what. So it can be a lot. But here is a simple version:
import collections import io log_data = ''' [01/Jan/2017:14:15:45 +1000] [01/Jan/2017:14:15:45 +1000] [01/Jan/2017:15:16:05 +1000] [01/Jan/2017:16:16:05 +1000] ''' def filter_lines(file): for line in file: if line.startswith('['): yield line def extract_hour_from_line(seq): for line in seq: yield line.split(':')[1] def access_per_hour(file): aph = collections.Counter(extract_hour_from_line(filter_lines(file))) return aph if __name__ == '__main__': logfile = io.StringIO(log_data) aph = access_per_hour(logfile) print(aph)
This uses StringIO to convert the strings you provided as an example into a βfileβ in memory that can be read. You can simply open your log file, as you are, no doubt, right now, to manage it properly.
The collections.Counter class takes a sequence and creates a dictionary-like object, where the keys are elements from the sequence, and the values ββare numbers β the number of times each of them appeared in the sequence.
This version of the code simply counts all the individual clock values, no matter what date the access takes place. That is, 12:00 on Tuesday and 12:00 on Wednesday are considered the same hours. This is useful if you are just plotting a bar chart of an hour.
If you want to do more complex grouping, you can try using the filter_lines function to limit which lines you want to see as a whole. For example, only strings between a date range or only strings that access a specific URL.
If you want to treat different days as different, you can use the extract_hour_from_line function to build a separate value - for example, combine the date and hour.
source share