If you expect low prefix sharing, then trie might not be your best bet.
, , - " " , .
. , , , ArrayList. .
, 50 10 , :
10 character string:
String: 12 byte header + 4 byte 'hash' + 4 byte 'value' ref = 24 bytes (aligned)
char[]: 12 byte header + 4 byte 'length' + 10 * 2 byte 'char' = 40 bytes (aligned)
Total: 24 + 40 = 64 bytes
Array of 50 million 10 character strings:
String[]: 12 byte header + 4 byte 'length' + 50,000,000 * 4 byte 'String' ref = 200,000,016 bytes
Values: 50,000,000 * 64 bytes = 3,200,000,000 bytes
Total: 200,000,016 + 3,200,000,000 = 3,400,000,016 bytes = 3.2 GB
String[], ArrayList<String> String[]. Arrays.sort() 50% (~ 100 000 000 ) , ArrayList GC , .
, - ~ 3,5 , .
, , . , String 24 64 . char[].
, US-ASCII ISO-8859-1, char[] byte[], .
, 64 32 , 3,2 1,8 2 .
UPDATE
, , , . MCVE, , .
public class Test {
public static void main(String[] args) {
String[] wordsFromFile = { "appear", "attack", "cellar", "copper",
"erratic", "grotesque", "guitar", "guttural",
"kittens", "mean", "suit", "trick" };
List<byte[]> wordList = new ArrayList<>();
for (String word : wordsFromFile)
wordList.add(word.getBytes(StandardCharsets.US_ASCII));
byte[][] symbolTable = wordList.toArray(new byte[wordList.size()][]);
test(symbolTable, "abc");
test(symbolTable, "attack");
test(symbolTable, "car");
test(symbolTable, "kittens");
test(symbolTable, "xyz");
}
private static void test(byte[][] symbolTable, String word) {
int idx = Arrays.binarySearch(symbolTable,
word.getBytes(StandardCharsets.US_ASCII),
Test::compare);
if (idx < 0)
System.out.println("Not found: " + word);
else
System.out.println("Found : " + word);
}
private static int compare(byte[] w1, byte[] w2) {
for (int i = 0, cmp; i < w1.length && i < w2.length; i++)
if ((cmp = Byte.compare(w1[i], w2[i])) != 0)
return cmp;
return Integer.compare(w1.length, w2.length);
}
}
Not found: abc
Found : attack
Not found: car
Found : kittens
Not found: xyz