Word Frequency Statistics

In a preliminary interview, I came across this question:

If a line consists of words separated by a single space, print the words in descending order, sorted by the number of times they appear in the line.

For example, the input line "abb" generates the following output:

b : 2 a : 1 

First, I would say that it is not so clear that the input line consists of single-letter words or letters with several letters. If so, it could be simple.

Here is my thought:

 int c[26] = {0}; char *pIn = strIn; while (*pIn != 0 && *pIn != ' ') { ++c[*pIn]; ++pIn; } /* how to sort the array c[26] and remember the original index? */ 

I can get the frequency statistics of each one-letter word in the input string, and I can sort it (using QuickSort or something else). But after the count array is sorted, how do I get the one-letter word associated with the counter so that I can print them later in pairs?

If the input string consists of a multi-letter word, I plan to use map<const char *, int> to track the frequency. But then again, how to sort a key-value pair of a card?

The question is in C or C ++, and any suggestion is welcome.

Thanks!

+4
source share
3 answers

I would use std::map<std::string, int> to store words and their counters. Then I would use something like this to get the words:

 while(std::cin >> word) { // increment map count for that word } 

Finally, you just need to figure out how to print them in frequency, I will leave this as an exercise for you.

+1
source

You are definitely mistaken in believing that you only need 26 options, because your employer will also want to allow polysemantic words (and maybe even numbers?).

This means that you will need an array with a variable length. I highly recommend using a vector or, even better, map.

To find sequences of characters in a string, find the current position (beginning at 0) and the position of the next space. Then this is the word. Set the current position in space and do it again. Keep repeating this until you're done.

Using the card, you will already have an available word / number.

If the job you are applying for requires university skills, I highly recommend optimizing the map by adding some hashing functionality. However, judging by the complexity of the issue, I assume that this is not so.

0
source

Taking the C-language case:

I like brute forces, simple algons, so I would do it like this:

  • Label the input string to give an unsorted array of words. I have to actually physically move each word (because each of them has a variable length); and I think I will need a char * array, which I will use as an argument for qsort ().

  • qsort () (decrease) of this array of words. (The qsort () function in the COMPAR function pretends that large words are smaller words, so the array takes on a descending sort order.)

3.a. Go through an array sorted in order, look for subarrays of identical words. The end of the subarray, as well as the beginning of the next, is signaled by the first non-identical word that I see. 3.b. When I get to the end of a subarray (or to the end of a sorted array), I know (1) the word and (2) the number of identical words in the subarray.

EDIT new step 4: Save in another array (call it array2), a char * word in Subar and the number of identical words in the subarray.

  • When there are no more words in the sorted array, I am done. this is print time. A.

  • qsort () array2 word frequency.

  • go through array2, print each word and its frequency.

I DONE! Release for lunch.

0
source

Source: https://habr.com/ru/post/1388628/


All Articles