I have a simple Trie that I use to store about 80k words of length 2 to 15. This works great for checking if a string is a word; However, now I need a way to get a random word of a certain length. In other words, I need "getRandomWord (5)" to return a 5 letter word, with all 5 letter words having an equal probability of return.
The only way I can think of is to pick a random number and cross the width of the tree first, until I miss so many words of the desired length. Is there a better way to do this?
Perhaps not necessary, but here is the code for my game.
class TrieNode { private TrieNode[] c; private Boolean end = false; public TrieNode() { c = new TrieNode[26]; } protected void insert(String word) { int n = word.charAt(0) - 'A'; if (c[n] == null) c[n] = new TrieNode(); if (word.length() > 1) { c[n].insert(word.substring(1)); } else { c[n].end = true; } } public Boolean isThisAWord(String word) { if (word.length() == 0) return false; int n = word.charAt(0) - 'A'; if (c[n] != null && word.length() > 1) return c[n].isThisAWord(word.substring(1)); else if (c[n] != null && c[n].end && word.length() == 1) return true; else return false; } }
Edit: a prominent answer worked well; I will add my implementation here for posterity if this helps anyone who has a similar problem.
First, I created a helper class to store the metadata about TrieNodes that I use in the search:
class TrieBranch { TrieNode node; int letter; int depth; public TrieBranch(TrieNode n, int l, int d) { letter = l; node = n; depth = d; } }
This is a class that contains Trie and implements a random word search. I'm kind of a beginner, so there may be better ways to do this, but I experienced it a bit and it seems to work. There is no error handling, so caution emptor.
class Dict { final static int maxWordLength = 13; final static int lettersInAlphabet = 26; TrieNode trie; int lengthFrequencyByLetter[][]; int totalLengthFrequency[]; public Dict() { trie = new TrieNode(); lengthFrequencyByLetter = new int[lettersInAlphabet][maxWordLength + 1]; totalLengthFrequency = new int[maxWordLength + 1]; } public String getRandomWord(int length) { // Returns a random word of the specified length from the trie // First, pick a random number from 0 to [number of words with this length] Random r = new Random(); int wordIndex = r.nextInt(totalLengthFrequency[length]); // figure out what the first letter of this word would be int firstLetter = -1, totalSoFar = 0; while (totalSoFar <= wordIndex) { firstLetter++; totalSoFar += lengthFrequencyByLetter[firstLetter][length]; } wordIndex -= (totalSoFar - lengthFrequencyByLetter[firstLetter][length]); // traverse the (firstLetter)'th node of trie depth-first to find the word (wordIndex)'th word int[] result = new int[length + 1]; Stack<TrieBranch> stack = new Stack<TrieBranch>(); stack.push(new TrieBranch(trie.getBranch(firstLetter), firstLetter, 1)); while (!stack.isEmpty()) { TrieBranch n = stack.pop(); result[n.depth] = n.letter; // examine the current node if (n.depth == length && n.node.isEnd()) { wordIndex--; if (wordIndex < 0) { // search is over String sResult = ""; for (int i = 1; i <= length; i++) { sResult += (char)(result[i] + 'a'); } return sResult; } } // handle child nodes unless they're deeper than target length if (n.depth < length) { for (int i = 25; i >= 0; i--) { if (n.node.getBranch(i) != null) stack.push(new TrieBranch(n.node.getBranch(i), i, n.depth + 1)); } } } return "failure of some sort"; } }
Using a random dictionary (80k words max length 12), each call to getRandomWord () takes a value of .2ms, and using a more detailed dictionary (250K words, maximum length 24), each call takes about 1 ms.