Top 10 frequencies in a hash table with linked lists

The code below will display the highest frequency that it can find in my hash table (from which is a bunch of link lists) 10 times. I need my code to print the top 10 frequencies in my hash table. I don’t know how to do this (code examples would be great, simple English logic / pseudo code is just as great).

  • I am creating a temporary hash list called 'tmp' which points to my hashtable hash table
  • Then the while loop goes through the list and searches for the highest frequency, which is int 'tmp-> freq'
  • The loop will continue this process of duplicating the highest frequency it finds, with the variable 'topfreq', until it reaches the end of the linked lists in the hash table.

My 'node' is a structure consisting of the variables "freq" (int) and "word" (128 char). When the loop has nothing else to look for, it prints these two values ​​on the screen.

The problem is that I cannot wrap my head in figuring out how to find the next lowest number from the number I just found (and this may include another node with the same frequency value, so I have to check that it is not same).

void toptenwords()
{
    int topfreq = 0;
    int minfreq = 0;
    char topword[SIZEOFWORD];

    for(int p = 0; p < 10; p++) // We need the top 10 frequencies... so we do this 10 times
    {
        for(int m = 0; m < HASHTABLESIZE; m++) // Go through the entire hast table
        {
            node* tmp;
            tmp = hashtable[m];

            while(tmp != NULL) // Walk through the entire linked list
            {
                if(tmp->freq > topfreq) // If the freqency on hand is larger that the one found, store...
                {
                    topfreq = tmp->freq;
                    strcpy(topword, tmp->word);
                }
                tmp = tmp->next;
            }
        }
        cout << topfreq << "\t" << topword << endl;
    }
}

Any help would be USED :)

+3
source share
8 answers

I finally figured it out ...

void toptenwords()
{
    int topfreq = 0;
    char topword[SIZEOFWORD];
    int counter = 0;

    cout << "\nTop Ten Words" << endl;

    for(int m = 0; m < HASHTABLESIZE; m++) // We need to find the highest frequency first...
    {
        node* tmp;
        tmp = hashtable[m];

        while(tmp != NULL) // Walk through the entire linked list
        {
            if(tmp->freq > topfreq) // If the freqency on hand is larger that the one found, store...
            {
                topfreq = tmp->freq;
                strcpy(topword, tmp->word);
            }
            tmp = tmp->next;
        }
    }

    while(topfreq > 0 && counter < 10) // This will now find the top 10 frequencies
    {       
        for(int m = 0; m < HASHTABLESIZE; m++) // Go through the entire hast table
        {
            node* tmp;
            tmp = hashtable[m];

            while(tmp != NULL) // Walk through the entire linked list
            {
                if(tmp->freq == topfreq) // If the freqency we're on is equal to the frequency we're keeping count of...
                {
                    counter++;
                    if(counter > 10) // We only want the top 10 words
                        break;
                    topfreq = tmp->freq; // Store the node details...
                    strcpy(topword, tmp->word);
                    cout << topfreq << "\t" << topword << endl;
                }
                tmp = tmp->next;
            }
        }

        topfreq--; // If counter is never incremented again, this will surely kill the loop... eventually.
    }
}

30-60 , . (, , ).

-1

10 node node , . node .

void toptenwords()
{
        int topfreq = 0;
        int minfreq = 0;
        node *topwords[11];
        int current_topwords = 0;

        for(int m = 0; m < HASHTABLESIZE; m++) // Go through the entire hast table
        {
                node* tmp;
                tmp = hashtable[m];

                while(tmp != NULL) // Walk through the entire linked list
                {
                        topwords[current_topwords] = tmp;
                        current_topwords++;
                        for(int i = current_topwords - 1; i > 0; i--)
                        {
                                if(topwords[i]->freq > topwords[i - 1]->freq)
                                {
                                        node *temp = topwords[i - 1];
                                        topwords[i - 1] = topwords[i];
                                        topwords[i] = temp;
                                }
                                else break;
                        }
                        if(current_topwords > 10) current_topwords = 10;
                        tmp = tmp->next;
                }
        }
}
+2

, , , tmp- > word .

0

- ( , ) (std:: set) . , , , 10 . , ( ) , .

-, .

0

1 ():

, (, ) 10 , .

2 ():

, 1, , .

0

-, , , , .

, /, ​​ , O (1) O (n). , - , , O (n × O (1) + 10 × O (n)) ≡ O (n).

0

, n , k ( k = 10).

n k, , , (.. ). , k + 1 , . , k , k , . k .

:. n : k . O (log k), O (nlog k). , k k, O (klog k) O ((n + k) log k). , k < n, O (klog k) O (nlog k), O (nlog k).

0

SoftHeap. SoftHeap, 10 O (n) , , , O (n lg n).

http://en.wikipedia.org/wiki/Soft_heap

This wikipedia article shows how to find the median O (n) times using softheap, and the top 10 is just a subset of the median problem. Then you could sort the elements that were in the top ten, if you need them in order, and since you always do not sort 10 elements, this is still O (n) time.

0
source

Source: https://habr.com/ru/post/1708019/


All Articles