How to get a random word of a certain length from Trie

I have a simple Trie that I use to store about 80k words of length 2 to 15. This works great for checking if a string is a word; However, now I need a way to get a random word of a certain length. In other words, I need "getRandomWord (5)" to return a 5 letter word, with all 5 letter words having an equal probability of return.

The only way I can think of is to pick a random number and cross the width of the tree first, until I miss so many words of the desired length. Is there a better way to do this?

Perhaps not necessary, but here is the code for my game.

class TrieNode { private TrieNode[] c; private Boolean end = false; public TrieNode() { c = new TrieNode[26]; } protected void insert(String word) { int n = word.charAt(0) - 'A'; if (c[n] == null) c[n] = new TrieNode(); if (word.length() > 1) { c[n].insert(word.substring(1)); } else { c[n].end = true; } } public Boolean isThisAWord(String word) { if (word.length() == 0) return false; int n = word.charAt(0) - 'A'; if (c[n] != null && word.length() > 1) return c[n].isThisAWord(word.substring(1)); else if (c[n] != null && c[n].end && word.length() == 1) return true; else return false; } } 

Edit: a prominent answer worked well; I will add my implementation here for posterity if this helps anyone who has a similar problem.

First, I created a helper class to store the metadata about TrieNodes that I use in the search:

 class TrieBranch { TrieNode node; int letter; int depth; public TrieBranch(TrieNode n, int l, int d) { letter = l; node = n; depth = d; } } 

This is a class that contains Trie and implements a random word search. I'm kind of a beginner, so there may be better ways to do this, but I experienced it a bit and it seems to work. There is no error handling, so caution emptor.

 class Dict { final static int maxWordLength = 13; final static int lettersInAlphabet = 26; TrieNode trie; int lengthFrequencyByLetter[][]; int totalLengthFrequency[]; public Dict() { trie = new TrieNode(); lengthFrequencyByLetter = new int[lettersInAlphabet][maxWordLength + 1]; totalLengthFrequency = new int[maxWordLength + 1]; } public String getRandomWord(int length) { // Returns a random word of the specified length from the trie // First, pick a random number from 0 to [number of words with this length] Random r = new Random(); int wordIndex = r.nextInt(totalLengthFrequency[length]); // figure out what the first letter of this word would be int firstLetter = -1, totalSoFar = 0; while (totalSoFar <= wordIndex) { firstLetter++; totalSoFar += lengthFrequencyByLetter[firstLetter][length]; } wordIndex -= (totalSoFar - lengthFrequencyByLetter[firstLetter][length]); // traverse the (firstLetter)'th node of trie depth-first to find the word (wordIndex)'th word int[] result = new int[length + 1]; Stack<TrieBranch> stack = new Stack<TrieBranch>(); stack.push(new TrieBranch(trie.getBranch(firstLetter), firstLetter, 1)); while (!stack.isEmpty()) { TrieBranch n = stack.pop(); result[n.depth] = n.letter; // examine the current node if (n.depth == length && n.node.isEnd()) { wordIndex--; if (wordIndex < 0) { // search is over String sResult = ""; for (int i = 1; i <= length; i++) { sResult += (char)(result[i] + 'a'); } return sResult; } } // handle child nodes unless they're deeper than target length if (n.depth < length) { for (int i = 25; i >= 0; i--) { if (n.node.getBranch(i) != null) stack.push(new TrieBranch(n.node.getBranch(i), i, n.depth + 1)); } } } return "failure of some sort"; } } 

Using a random dictionary (80k words max length 12), each call to getRandomWord () takes a value of .2ms, and using a more detailed dictionary (250K words, maximum length 24), each call takes about 1 ms.

+6
source share
1 answer

To make sure that you have a chance to get each 5-letter word, you need to know how many 5-letter words are in your tree. Since you are building a tree, you add the word length that you add to two counters: a common frequency counter and a frequency counter in letters:

 int lengthFrequencyByLetter[letterIndex][maxWordLength-1] int totalLengthFrequency[maxWordLength-1] 

So, if you have 4000 words with five letters, and 213 of them begin with "d", then

 lengthFrequencyByLetter[3][4] = 213 

and

 totalLengthFrequency[4] = 4000 

after you finish adding everything to your tree. (The letter "a" is 0, "b" is 1, ... "z" is 25.)

Here you can search for the n word of a given length , where n is a random integer selected from a uniform random distribution in the range (0, totalLengthFrequency[length-1] ).

Say you have 4,000 5-letter words in your structure. You pick a random number 1234. Now you can check

 lengthFrequencyByLetter[0][4] lengthFrequencyByLetter[1][4] lengthFrequencyByLetter[2][4] lengthFrequencyByLetter[3][4] 

in turn, until you exceed a total of 1234. Then you quickly know what the initial letter of the 1234th 5-letter word is, and then look there. You do not need to search every word in the tree every time from the very beginning.

+7
source

Source: https://habr.com/ru/post/947485/


All Articles