Naively, I would just create a hash table until it reaches a certain limit in memory, and then sort it in memory and write it down. Finally, you can perform n-way merging of each fragment. At best, you will have 100/4 pieces or so, but probably a lot less if some words are more common than others (and how they are grouped).
- trie, . 256-way tree, . .