Java: what is the overhead of using ConcurrentSkipList * if there is no concurrency?

Question

Java: what is the overhead of using ConcurrentSkipList * if there is no concurrency?

I need a sorted list in a script with a predominance of iteration (compared to insert / delete, not randomly). For this reason, I was thinking about using a skip list compared to a tree (the iterator should be faster).

The problem is that java6 only has a parallel implementation of the skip list, so I assumed whether it makes sense to use it in a non-competitive scenario or if the overhead makes this the wrong decision.

For what I know, ConcurrentSkipList * are basically CAS-based locks, so they shouldn't carry overhead, but I wanted to hear the opinion of another.

EDIT:

After some micro benchmarking (performing multiple iterations on different sizes of TreeSet, LinkedList, ConcurrentSkipList and ArrayList) shows that there is quite some overhead. ConcurrentSkipList stores items in a linked list internally, so the only reason it will be slower in iteration than LinkedList would be due to the above overhead.

+4

java performance collections concurrency data-structures

marcorossi Nov 03 '11 at 2:32

source share

6 answers

If streaming security is not required, I would say to skip package java.util.concurrent in general.

Interestingly, sometimes ConcurrentSkipList is slower than TreeSet on the same input, and I have not figured out why.

I mean, have you seen the source code for ConcurrentSkipListMap? :-) I always have to smile when I look at this. These are 3,000 lines of some of the craziest, scariest and at the same time beautiful codes that I have ever seen in Java. (Kudos for Doug Lea and co. To get all the concurrency utils that are equally well integrated with the collection map!) Having said that on modern processors, the complexity of the code and the algorithmic will not even matter much. Which usually makes more differences in that the data for iteration is located in memory, so that the CPU cache can do its job better.

So, at the end, I wrap ArrayList with a new addSorted () method, which performs a sorted insert into an ArrayList.

That sounds good. If you really need to squeeze every drop of performance from the iteration, you can also try iterating the raw array directly. Regroup it with every change, for example. by calling TreeSet.toArray() or generating it, then sorting it in place using Arrays.sort(T[], Comparator<? super T>) . But the gain can be tiny (or even nothing if JIT does its job well), so it may not be worth the inconvenience.

+3

Luke usherwood Nov 05 '11 at 1:54

source share

As measured using Open JDK 6 on a typical manufacturing equipment that my company uses, you can expect that all the add and query operations on the skip list card will be approximately twice as large as the same operation on the tree map.

examples:

63 usec versus 31 usec to create and add 200 entries. 145 ns versus 77 ns for get () on this 200-element map.

And the ratio will not change all this for smaller and larger sizes.

(The code for this test will eventually be split so that you can view it and run it yourself, sorry, we are not ready to do it yet.)

+3

Kevin bourrillion Nov 09 '11 at 21:47

source share

Without concurrency, it is generally more efficient to use a balanced binary search tree. In Java, it will be a TreeMap .

Skip lists are usually reserved for concurrent programming because of their ease in implementing speed in multi-threaded applications.

+1

tskuzzy Nov 03 '11 at 2:41

source share

You seem to understand the compromise well here, so I doubt that anyone can give you a final, principled answer. Fortunately, this is pretty simple to check.

I started by creating a simple Iterator<String> that has infinite paths over a finite list of randomly generated strings. (That is: during initialization, it generates an array of _strings random strings of length b from a pool of c different characters. The first call to next() returns _strings[0] , the next call returns _strings[1] & hellip; (n + 1) -th call returns _strings[0] again.) The rows returned by this iterator were what I used in all calls to SortedSet<String>.add(...) and SortedSet<String>remove(...) .

Then I wrote a test method that takes an empty SortedSet<String> and loop d times. At each iteration, it adds e elements, then deletes f elements, and then iterates over the entire set. (As a health check, it keeps track of the given size using the return values add() and remove() , and when iterating over the entire set, it ensures that it finds the expected number of elements. That there’s just something in the body of the loop.)

I don't think I need to explain what my main(...) method does. main(...)

I tried different values for different parameters, and I found that sometimes ConcurrentSkipListSet<String> performed better, and sometimes TreeSet<String> , but the difference was no more than doubled. In general, ConcurrentSkipListSet<String> better when:

a, b and / or c were relatively large. (I mean, within the ranges that I tested. My range ranged from 1000 to 10000, my b from 3 to 10, my c from 10 to 80. In general, the resulting set sizes ranged from about 450 to about 10000, with modes 666 and 6666, because I usually used e = 2 & lrm; f.) This suggests that ConcurrentSkipListSet<String> does a little better than TreeSet<String> with larger sets and / or with more expensive comparisons lines. Trying to use specific values designed to tease these two factors, I got the impression that ConcurrentSkipListSet<String> noticeably better than TreeSet<String> with larger sets, and slightly less good with more expensive string mappings. (This is basically what you expect; the TreeSet<String> binary tree approach aims at the absolute minimum possible number of comparisons.)
e and f were small; that is, when I called add(...) and remove(...) only a small number of times per iteration. (This is exactly what you predicted.) The exact pivot point depended on a, b, and c, but as a first approximation ConcurrentSkipListSet<String> performed better when e + f was less than about 10, and TreeSet<String> performed better when e + f is greater than about 20.

Of course, it was on a machine that might look different than yours, using the JDK, which might look different from yours, and use very artificial data that might look different from yours. I would recommend you do your own tests. Since Tree* and ConcurrentSkipList* implement Sorted* , you can easily try your code in both directions and see what you find.

For what I know, ConcurrentSkipList * are basically CAS based locks, so they shouldn't carry overhead [& hellip;]

I understand that this will depend on the machine. On some systems, locking may not be possible, in which case these classes will have to use locks. (But since you are not really multithreaded, even locks may not be that expensive. Of course, there are overheads for synchronization, but its main cost is a lock conflict and forced single-threading. This is not a problem for you. Again, I think you just need to check and see how the two versions work.)

+1

ruakh Nov 03 '11 at 15:01

source share

As noted, SkipList has a lot of overhead compared to TreeMap, and the TreeMap iterator is not very suitable for your use case, because it just calls the successor () method again, which is very slow.
Thus, one of the alternatives, which will be significantly faster than the previous ones, is to write your own TreeMap iterator. In fact, I would completely drop TreeMap, since 3,000 lines of code are a bit cumbersome than you probably need, and just write a clean implementation of the AVL tree using the methods you need. The main logic of AVL is just a few hundred lines of code in any language , then add the iterator that works best in your case.

0

David McManamon Nov 15 '17 at 17:57

source share

Salvatore previti · Accepted Answer · 2011-11-03T02:37:38+0000

Well, you can use many other structures to create a skip list, it exists in a parallel package because parallel data structures are much more complicated and because using a parallel skip list will cost less than using other parallel data structures to simulate a skip list.

In one thread, the world is different: you can use a sorted set, a binary tree, or your own data structure, which will work better than a parallel skip list.

The difficulty of repeating a list of trees or a list of skips will always be O (n), but if you use a linked list or a list of arrays instead, you will have an insertion problem: to insert an item at the desired position (sorted linked list), the insertion complexity will be O (n ) instead of O (log n) for a binary tree or for a skip list.

You can iterate through TreeMap.keySet () to get all inserted keys in order, and it won't be so slow.

There is also a TreeSet class, which is probably what you need, but since this is just a wrapper for TreeMap, using TreeMap directly will be faster.

Java: what is the overhead of using ConcurrentSkipList * if there is no concurrency?

More articles: