Efficient multi-currency memory card

Hi I have the following problem: I store strings and the corresponding list of integer values ​​in MultiValueMap<String, Integer> I store about MultiValueMap<String, Integer> million strings, and one line can have up to 500 or more values. For each individual value, I will have random access on the Map. In the worst case scenario, 13,000,000 * 500 calls. Now the card speed is good, but the memory overhead is getting pretty high. A MultiValueMap<String, Integer> is nothing but HashMap/TreeMap<String, <ArrayList<Integer>> . HashMap and TreeMap have a lot of memory. I will not change the map as soon as this is done, but I need it to be as fast and as small as possible for random access in the program. (I store it on disk and load it at startup, the serialized card file takes about 600 MB, but in memory about 3 GB?)

the most efficient way to store data would be to store the String in a sorted array of strings and the presence of a corresponding two-dimensional array of int for values. Thus, access will be a binary search in an array of strings and obtaining the corresponding values.

Now I have three ways to get there:

  • I use the sorted MultivalueMap (TreeMap) for the create phase to save everything. After I finish getting all the values, I get a string array by calling map.keyset().toArray(new String[0]); Make a two-dimensional array int and get all the values ​​from a multi-currency map Pro: it is easy to implement, it is still fast during creation. Con: It takes up even more memory when copying from Map to arrays.

  • I have been using Arrays or ArrayLists from the start and keep all the Pros: the least memory overhead. Con: that would be extremely slow because I would have to sort / copy Array every time I add a new key. I will also need to do my own (possibly slower) sorting to keep the corresponding int array in the same order as the rows. Hard to implement

  • I use Array and MultivalueMap as a buffer. After the program has completed 10% or 20% of the creation phase, I will add the values ​​to the arrays and save them, and then start a new map. Pro: Promotion is still fast enough and quite effective. Con: hard to implement.

None of these solutions seem right to me. Do you know any other solutions to this problem, perhaps the implementation of a memory card (MultiValue)?

I know that I can use the database, so don’t worry, posting it as an answer. I want to know how to do this without using a database.

+6
source share
5 answers

If you switched to Guava Multimap - I have no idea if this is possible for your application - you could use Trove and get

 ListMultimap<String, Integer> multimap = Multimaps.newListMultimap( new HashMap<String, Collection<Integer>>(), new Supplier<List<Integer>>() { public List<Integer> get() { return new TIntListDecorator(); } }); 

which will make ListMultimap , which uses a HashMap to map to List values ​​supported by int[] arrays, which should be memory efficient, although you will pay a slight speed limit due to boxing. Perhaps you can do something similar for MultiValueMap , although I have no idea what the library is from.

+5
source

Depending on what Integer values ​​are stored on your card, a large amount of heap memory overhead can be caused by the presence of separate Integer instances that take up much more RAM than the value of the primitive input.

Consider using Map from String to one of the many IntArrayList implementations floating around (e.g., in Colt or in Primitive Collections for Java) that basically implement the list supported by the int array , rather than being backed up by an array of Integer instances.

+2
source

You can use a short line to significantly reduce memory usage.

In addition, there are other more radical solutions (this will require some re-execution):

+2
source

First, consider the memory taken by integers. You said that the range will be around 0-4000000. 24 bits is enough to represent 16777216 different values. If this is acceptable, you can use byte arrays for integers, with 3 bytes per integer and save 25%. You will need to index into an array something like this:

 int getPackedInt(byte[] array, int index) { int i = index*3; return ((array[i] & 0xFF)<<16) + ((array[i+1] & 0xFF) <<8) + (array[i+2] & 0xFF); } int storePackedInt(byte[] array, int index, int value) { assert value >= 0 && value <= 0xFFFFFF; int i = index*3; array[i] = (byte)((value>>16) & 0xFF); array[i+1] = (byte)((value>>8) & 0xFF); array[i+2] = (byte)(value & 0xFF); } 

Can you say something about the distribution of integers? If many of them will fit in 16 bits, you can use encoding with a variable number of bytes per number (something like UTF-8 to represent characters).

Next, consider whether it is possible to save memory in Strings. What are the characteristics of the strings? How long will they usually be? Will many lines split prefixes? A compression scheme adapted to the characteristics of your application can save a lot of space (as falsarella pointed out). OR, if many strings will use prefixes, storing them in some type of search may be more efficient. (There is a trie type called "patricia" that might be suitable for this application.) As a bonus, note that searching for strings in trie may be faster than searching for a hash map (although you will need to check to see if this is true in your application).

Will all the lines be ASCII? If so, 50% of the memory used for strings will be wasted, since Java char is 16 bits. Again, in this case, you might consider using byte arrays.

If you only need to look at the lines and not iterate over the stored lines, you can also consider something quite unconventional: hash the lines and save only the hash. Since different lines can hash to the same value, it is likely that a line that has never been saved can still be found by searching. But if you use enough bits for a hash value (and a good hash function), you can make this probability so infinitely small that it will almost certainly never happen in the life expectancy of the universe.

Finally, there is memory for the structure itself, which contains strings and integers. I already suggested using trie, but if you decide not to do this, nothing will use less memory than parallel arrays - one sorted array of strings (to which you, as you said, can perform a binary search) and a parallel array of integer arrays. After you do a binary search to find the index in the String array, you can use the same index to access the array-of-integer array.

While you are building the structure, if you decide that trie search is a good choice, I would just use it directly. Otherwise, you could do 2 passes: one to create a set of strings (then put them in an array and sort them), and the second to add arrays of integers.

+2
source

If you have templates for your key lines, especially common roots, then aa Trie can be an effective method of storing significantly less data.

Here is the code for working TrieMap.

Note. The usual advice for using EntrySet to iterate over Map does not apply to Trie s. They are extremely inefficient in Trie , so please avoid asking for them, if at all possible.

 /** * Implementation of a Trie structure. * * A Trie is a compact form of tree that takes advantage of common prefixes * to the keys. * * A normal HashSet will take the key and compute a hash from it, this hash will * be used to locate the value through various methods but usually some kind * of bucket system is used. The memory footprint resulting becomes something * like O(n). * * A Trie structure essentuially combines all common prefixes into a single key. * For example, holding the strings A, AB, ABC and ABCD will only take enough * space to record the presence of ABCD. The presence of the others will be * recorded as flags within the record of ABCD structure at zero cost. * * This structure is useful for holding similar strings such as product IDs or * credit card numbers. * */ public class TrieMap<V> extends AbstractMap<String, V> implements Map<String, V> { /** * Map each character to a sub-trie. * * Could replace this with a 256 entry array of Tries but this will handle * multibyte character sets and I can discard empty maps. * * Maintained at null until needed (for better memory footprint). * */ protected Map<Character, TrieMap<V>> children = null; /** * Here we store the map contents. */ protected V leaf = null; /** * Set the leaf value to a new setting and return the old one. * * @param newValue * @return old value of leaf. */ protected V setLeaf(V newValue) { V old = leaf; leaf = newValue; return old; } /** * I've always wanted to name a method something like this. */ protected void makeChildren () { if ( children == null ) { // Use a TreeMap to ensure sorted iteration. children = new TreeMap<Character, TrieMap<V>>(); } } /** * Finds the TrieMap that "should" contain the key. * * @param key * * The key to find. * * @param grow * * Set to true to grow the Trie to fit the key. * * @return * * The sub Trie that "should" contain the key or null if key was not found and * grow was false. */ protected TrieMap<V> find(String key, boolean grow) { if (key.length() == 0) { // Found it! return this; } else { // Not at end of string. if (grow) { // Grow the tree. makeChildren(); } if (children != null) { // Ask the kids. char ch = key.charAt(0); TrieMap<V> child = children.get(ch); if (child == null && grow) { // Make the child. child = new TrieMap<V>(); // Store the child. children.put(ch, child); } if (child != null) { // Find it in the child. return child.find(tail(key), grow); } } } return null; } /** * Remove the head (first character) from the string. * * @param s * * The string. * * @return * * The same string without the first (head) character. * */ // Suppress warnings over taking a subsequence private String tail(String s) { return s.substring(1, s.length()); } /** * * Add a new value to the map. * * Time footprint = O(s.length). * * @param s * * The key defining the place to add. * * @param value * * The value to add there. * * @return * * The value that was there, or null if it wasn't. * */ @Override public V put(String key, V value) { V old = null; // If empty string. if (key.length() == 0) { old = setLeaf(value); } else { // Find it. old = find(key, true).put("", value); } return old; } /** * Gets the value at the specified key position. * * @param o * * The key to the location. * * @return * * The value at that location, or null if there is no value at that location. */ @Override public V get(Object o) { V got = null; if (o != null) { String key = (String) o; TrieMap<V> it = find(key, false); if (it != null) { got = it.leaf; } } else { throw new NullPointerException("Nulls not allowed."); } return got; } /** * Remove the value at the specified location. * * @param o * * The key to the location. * * @return * * The value that was removed, or null if there was no value at that location. */ @Override public V remove(Object o) { V old = null; if (o != null) { String key = (String) o; if (key.length() == 0) { // Its me! old = leaf; leaf = null; } else { TrieMap<V> it = find(key, false); if (it != null) { old = it.remove(""); } } } else { throw new NullPointerException("Nulls not allowed."); } return old; } /** * Count the number of values in the structure. * * @return * * The number of values in the structure. */ @Override public int size() { // If I am a leaf then size increases by 1. int size = leaf != null ? 1 : 0; if (children != null) { // Add sizes of all my children. for (Character c : children.keySet()) { size += children.get(c).size(); } } return size; } /** * Is the tree empty? * * @return * * true if the tree is empty. * false if there is still at least one value in the tree. */ @Override public boolean isEmpty() { // I am empty if I am not a leaf and I have no children // (slightly quicker than the AbstaractCollection implementation). return leaf == null && (children == null || children.isEmpty()); } /** * Returns all keys as a Set. * * @return * * A HashSet of all keys. * * Note: Although it returns Set<S> it is actually a Set<String> that has been * home-grown because the original keys are not stored in the structure * anywhere. */ @Override public Set<String> keySet() { // Roll them a temporary list and give them a Set from it. return new HashSet<String>(keyList()); } /** * List all my keys. * * @return * * An ArrayList of all keys in the tree. * * Note: Although it returns List<S> it is actually a List<String> that has been * home-grown because the original keys are not stored in the structure * anywhere. * */ protected List<String> keyList() { List<String> contents = new ArrayList<String>(); if (leaf != null) { // If I am a leaf, a null string is in the set. contents.add((String) ""); } // Add all sub-tries. if (children != null) { for (Character c : children.keySet()) { TrieMap<V> child = children.get(c); List<String> childContents = child.keyList(); for (String subString : childContents) { // All possible substrings can be prepended with this character. contents.add((String) (c + subString.toString())); } } } return contents; } /** * Does the map contain the specified key. * * @param key * * The key to look for. * * @return * * true if the key is in the Map. * false if not. */ public boolean containsKey(String key) { TrieMap<V> it = find(key, false); if (it != null) { return it.leaf != null; } return false; } /** * Represent me as a list. * * @return * * A String representation of the tree. */ @Override public String toString() { List<String> list = keyList(); //Collections.sort((List<String>)list); StringBuilder sb = new StringBuilder(); Separator comma = new Separator(","); sb.append("{"); for (String s : list) { sb.append(comma.sep()).append(s).append("=").append(get(s)); } sb.append("}"); return sb.toString(); } /** * Clear down completely. */ @Override public void clear() { children = null; leaf = null; } /** * Return a list of key/value pairs. * * @return * * The entry set. */ public Set<Map.Entry<String, V>> entrySet() { Set<Map.Entry<String, V>> entries = new HashSet<Map.Entry<String, V>>(); List<String> keys = keyList(); for (String key : keys) { entries.add(new Entry<String,V>(key, get(key))); } return entries; } /** * An entry. * * @param <S> * * The type of the key. * * @param <V> * * The type of the value. */ private static class Entry<S, V> implements Map.Entry<S, V> { protected S key; protected V value; public Entry(S key, V value) { this.key = key; this.value = value; } public S getKey() { return key; } public V getValue() { return value; } public V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } @Override public boolean equals(Object o) { if (!(o instanceof TrieMap.Entry)) { return false; } Entry e = (Entry) o; return (key == null ? e.getKey() == null : key.equals(e.getKey())) && (value == null ? e.getValue() == null : value.equals(e.getValue())); } @Override public int hashCode() { int keyHash = (key == null ? 0 : key.hashCode()); int valueHash = (value == null ? 0 : value.hashCode()); return keyHash ^ valueHash; } @Override public String toString() { return key + "=" + value; } } } 
+2
source

Source: https://habr.com/ru/post/908664/


All Articles