This can be done in O (n) if you use the correct data structure.
Consider a Node consisting of two things:
- Counter (initially set to 0).
- An array of 255 (or any number of characters) points to
Node . First, all pointers are set to NULL .
Create the root node. Define the "current" Node pointer, first set it to the root node. Then go through all the characters of the document and do the following:
- If the following characters are not spaces, select the appropriate pointer from the array of the current node. If it is
NULL , select it. The current Node pointer has been updated. - If it is a space (or any word delimiter) - increase the counter "current"
Node . Then, reset the "current" Node pointer to the root of the node.
So you build a tree in O (n). Each element (both node and leave) designates a specific word along with its counter.
Then move the tree to find the node with the largest counter. This is also O (n), since the number of elements in the tree is not more than O (n).
Update:
The last step is optional. In fact, the most common word can be updated during character processing. Below is the pseudo code:
struct Node { size_t m_Counter; Node* m_ppNext[255]; Node* m_pPrev; Node(Node* pPrev) :m_Counter(0) { m_pPrev = pPrev; memset(m_ppNext, 0, sizeof(m_ppNext)); } ~Node() { for (int i = 0; i < _countof(m_ppNext) i++) if (m_ppNext[i]) delete m_ppNext[i]; } }; Node root(NULL); Node* pPos = &root; Node* pBest = &root; char c; while (0 != (c = GetNextDocumentCharacter())) { if (c == ' ') { if (pPos != &root) { pPos->m_Counter++; if (pBest->m_Counter < pPos->m_Counter) pBest = pPos; pPos = &root; } } else { Node*& pNext = pPos->m_ppNext[c - 1]; if (!pNext) pNext = new Node(pPos); pPos = pNext; } }
valdo source share