Rule of thumb for choosing a Java Collection implementation?

Does anyone have a good rule of thumb to choose between different implementations of the Java Collection interfaces, such as List, Map, or Set?

For example, as a rule, why or in what cases do I prefer to use Vector or ArrayList, Hashtable or HashMap?

+57
java collections heuristics
Sep 07 '08 at 13:46
source share
10 answers

I always made these decisions in each case, depending on the use case, for example:

  • Do I need an order?
  • Will I have a null key / values? Dups?
  • It will be accessed by several threads
  • I need a key / value pair
  • Do I need random access?

And then I'll rip out my handy fifth edition of Java in a nutshell and compare the options ~ 20 or so. The fifth chapter contains small tables to help you understand what is appropriate.

Well, maybe if I find out the cuff that a simple ArrayList or HashSet will do the trick, I won’t look at all of this.;) But if there is something remotely complicated in my requested use, you bet I’m in the book . By the way, I would like Vector to be an "old hat" - I have not used it for many years.

+17
Sep 07 '08 at 14:03
source share

I really like this cheat sheet from Sergey Kovalchuk blog entry :

Java Map / Collection Cheat Sheet

The block diagram of Alexander Zagniotov was more detailed, but, unfortunately, it is not online. However, Wayback Machine has a copy of the blog :

Alexander Zaniotov's flowchart for choosing Collection implementations

+84
Jul 02 '13 at 8:19
source share

I assume that you know the difference between a list, a set, and a map from the answers above. Why you choose between their executive classes is another matter. For example:

List

  • ArrayList is fast, but slower when pasting. This is good for an implementation that reads a lot but does not insert / delete a lot. It stores its data in one continuous block of memory, so every time it needs to expand, it copies the entire array.
  • LinkedList is slow but fast to insert. This is good for an implementation that inserts / deletes a lot, but does not read a lot. It does not support the entire array in one contiguous block of memory.

Set:

  • HashSet does not guarantee the iteration order and is therefore the fastest of the sets. It has a lot of overhead and is slower than an ArrayList, so you shouldn't use it except for a lot of data when its hash rate becomes a factor.
  • TreeSet stores ordered data, so it is slower than a HashSet.

Map: The performance and behavior of HashMap and TreeMap are parallel to the implementations of Set.

You cannot use Vector and Hashtable. They are synchronized implementations before the release of the new Collection hierarchy is thus slowed down. If synchronization is required, use Collections.synchronizedCollection ().

+24
Sep 07 '08 at 14:17
source share

There are theoretically useful Big-Oh trade - offs, but in practice they almost never matter.

In real tests, ArrayList performs a LinkedList even with large lists and with operations such as "many inserts near the front." Academics ignore the fact that real algorithms have constant factors that can suppress the asymptotic curve. For example, linked lists require an additional distribution of objects for each node, which means slower to create a node and significantly worse memory access characteristics.

My rule:

  • Always start with ArrayList and HashSet and HashMap (i.e. not LinkedList or TreeMap).
  • Type declarations should always be an interface (i.e. List, Set, Map), so if a profiler or code check proves otherwise, you can change the implementation without breaking anything.
+12
07 Sep '08 at 15:26
source share

About your first question ...

The List, Map and Set serve various purposes. I suggest reading about the Java Collections Framework at http://java.sun.com/docs/books/tutorial/collections/interfaces/index.html .

To be more specific:

  • use List if you need a data structure as an array and you need to iterate over the elements
  • use Map if you need something like a dictionary
  • use the Kit if you only need to decide if something belongs to the kit or not.

About your second question ...

The main difference between Vector and ArrayList is that the first is synchronized and the second is not synchronized. You can read more about synchronization in Java Concurrency in practice .

The difference between a Hashtable (note that T is not a capital letter) and a HashMap is similar, the former is synchronized, the latter is not synchronized.

I would say that there is no practical rule to prefer this or that implementation, it really depends on your needs.

+8
Sep 07 '08 at 14:03
source share

For an unsorted best choice, more than nine times out of ten will be: ArrayList, HashMap, HashSet.

Vector and Hashtable are synchronized and therefore can be a bit slower. It's rare that you need synchronized implementations, and when you make their interfaces are not rich enough to make their synchronization useful. In the case of Map, ConcurrentMap adds additional operations to make the interface useful. ConcurrentHashMap is a good implementation of ConcurrentMap.

LinkedList is almost never a good idea. Even if you do a lot of inserts and deletion, if you use an index to indicate the position, then to iterate through the list you need to find the correct node. ArrayList is almost always faster.

For Map and Set, hash options will be faster than tree / sorted. Hash algortihms tend to have O (1) performance, while trees will be O (log n).

+5
Sep 07 '08 at 15:18
source share

Lists allow duplicate elements, while Sets allow only one instance.

I will use the map when I need to do a search.

For specific implementations, there are options for saving maps and sets that preserve order, but basically it comes down to speed. I tend to use ArrayList for fairly small lists and a HashSet for fairly small sets, but there are many implementations (including everything you write yourself). HashMap is quite common for Maps. Something more than β€œreasonably small,” and you should start to worry about memory so that the algorithm is more specific.

This page contains lots of animated images along with test testing of LinkedList code against ArrayList if you are interested in hard numbers.

EDIT: Hope the following links demonstrate how these things are really just toolbar items, you just need to think about what your needs are: See Commons-Collections Map , List, and Set versions.

+2
Sep 07 '08 at 14:06
source share

As suggested in other answers, there are different scenarios for using the correct collection depending on the use case. I list a few points,

ArrayList:

  • Most of the times you just need to store or sort through a bunch of things and then scroll through them. Iteration is faster than its index.
  • Whenever you create an ArrayList, it is allocated a fixed amount of memory and is once crowded out, it copies the entire array

LinkedList

  • It uses a doubly linked list, so the insert and delete operation will be quick, since it will only add or remove a node.
  • The extraction is slow, as it must pass through the nodes.

Hashset:

  • Making other yes-no decisions regarding an element, for example. "is the item an English word," "is the item in the database?", "is the item in this category?" and etc.

  • Remembering "what elements that you have already processed", for example. When crawling web pages

Hashmap

  • Used when you need to say "for a given X, what is Y"? This is often useful for implementing caches or indexes in memory. Ie pairs of key values. For example: For a given user ID, what is its cached name / user object ?.
  • Always search with a HashMap.

Vector and Hashtable are synchronized and therefore a bit slower, and if synchronization is required, use Collections.synchronizedCollection (). Check It out for sorted collections. Hope this is counted.

+2
Jun 21 '17 at 17:01
source share

I found Bruce Eckel Thinking in Java was very helpful. He compares various collections very well. I used to use the chart he posted showing the heirachy inheritance on my cube wall as a quick reference. One thing I suggest you do is keep in mind thread safety. Performance usually means no streaming security.

+1
Sep 07 '08 at 14:30
source share

Well, it depends on what you need. General recommendations:

A list is a collection in which data is stored in insertion order, and each item gets an index.

A set is a package of elements without duplication (if you reinsert the same element, it will not be added). Data has no concept of order.

Map You get access to your data elements and write them by the key, which can be any possible object.

enter image description here Attribution: Stack Overflow

For more information on Java collections, check out this article .

+1
May 30 '19 at 8:04
source share



All Articles