What data structure should be used to store key-value pairs of type <String, String>? One key has many meanings
my application reads bigram collocation (pairs) from a txt file. they should be read as key-value pairs. one key can have several values ββ(therefore, any kind of Map as a data structure is excluded) ... I want them to be sorted in a natural alphabetical order.
the first word of collocation, i.e. the key will be a verb, and its meaning will contribute to the collocation type collocation. This way trees can be viewed
So, essentially, I'm trying to implement
SortedList <String, String> kind of thing.
I came across the following data structures that fit my requirements, although I cannot decide which one to use: (the MultiMap mentioned here is part of the google collections framework)
Tries - I only know the basics of this data structure. I found one implementation of it in Java here . It does not perform the delete () operation.
or any other data structure that you would like to recommend? I have not yet gone through the Java dictionary ... Please help me decide which one to choose ...
Thanks!
EDIT - it is expected that the list will contain about 100-200 entries
EDIT2: Operations: search if there is a key value for a given key .. As I said, dst will store a list of word-word pairs as key-value entries; it is initialized by reading records from a file ... the work goes something like this: we first get all the keys from dst ... read the file and mark it (run through OpenNLP, dst is not for that) .. and then search if any the token controls the key (i.e. is a verb) in dst ... after detection, we get all the values ββfor the given key and search for the next token within the set of values ββ... if the value is also found in dst, it means that a collocation is detected . the values ββare set then ... THIS IS AS A DST SHOULD WORK AT WORK ...
java.util.NavigableMap is an interface that provides a map display with full key ordering. JavaSE 6 provides java.util.TreeMap or java.util.concurrent.ConcurrentSkipListMap as implementations. The first is probably enough for you. To be clear, I would recommend using something like:
Map<String,Set<String>> with the following specific type TreeMap<String, ArraySet<String>> .
Not HashMap or HashMultiMap , because they do not allow iteration of keys in order.
Not FastTreeMap or ConcurrentSkipListMap ... unless your application is multithreaded.
Various implementations of TreeMap or TreeMultiMap are fine, although versions of TreeMap will entail their creation as Map<String,List<String>> and list management.
Tree compared to Trie bit complicated. I suspect that a well-designed / implemented Trie will give a faster search, but I also suspect that more memory will be required. (I make some assumptions. In reality, the complexity analysis will depend on the details of the trie implementation.)
FYI: The Google Collections project has been discontinued and is now part of Google Guava .
Guava ListMultimap ensures that the values ββinside a specific key remain in the same order in which they appear in the file. However, it will not store the keys in the same order as they appeared in the file.
I think that, as suggested above, you can use TreeMap (depending on the size of your colletion), this ensures that the map will be in ascending order of the key or if you want to configure sorting using your own comparator when TreeMap.
final Map<String, List<String> resultMap = new TreeMap<String, List<String>>(); After creating the map, will you then update, add and delete the map further? Besides the easy walkthrough? I think HashMap is perfect if you have many add-ons, updates, etc. Perhaps it might be faster to create HaspMap initially and then convert it to TreeMap to go through? anyone? But if you are creating HaspMap, think about a loadfactor, make sure you have a high initial capacity to minimize the number of rehash operations. The default load factor is 0.75, so if you have a map with an initial size of 100, then adding 75 elements will rephrase the map.
ah found another stackoverflow link with a HashMap loadfactor HashMap initialization parameters (load / initialcapacity) .