Well, therefore, let's say I have a text file (not necessarily containing all possible characters), and I would like to calculate the frequency of each character and, after calculating the frequency, then I need to access each character and its frequency from the most common to least frequent. Characters are not necessarily ASCII characters, they can be arbitrary sequences of bytes, albeit of the same length.
I was considering doing something like this (in pseudocode):
function add_to_heap (symbol) freq = heap.find(symbol).frequency if (freq.exists? == true) freq++ else symbol.freq = 1 heap.insert(symbol) MaxBinaryHeap heap while somefile != EOF symbol = read_byte(somefile) heap.add_to_heap(symbol) heap.sort_by_frequency() while heap.root != empty root = heap.extract_root() do_stuff(root)
I was wondering: is there a better, easier way to calculate and save how many times each character appears in the file?
source share