Differences in performance ... so dramatic?

Question

Differences in performance ... so dramatic?

Right now I am reading a few posts about List<T> vs LinkedList<T> , so I decided to evaluate some structures myself. I compared Stack<T> , Queue<T> , List<T> and LinkedList<T> , adding data and removing data to / from the front / end. Here is the test result:

  Pushing to Stack... Time used: 7067 ticks Poping from Stack... Time used: 2508 ticks Enqueue to Queue... Time used: 7509 ticks Dequeue from Queue... Time used: 2973 ticks Insert to List at the front... Time used: 5211897 ticks RemoveAt from List at the front... Time used: 5198380 ticks Add to List at the end... Time used: 5691 ticks RemoveAt from List at the end... Time used: 3484 ticks AddFirst to LinkedList... Time used: 14057 ticks RemoveFirst from LinkedList... Time used: 5132 ticks AddLast to LinkedList... Time used: 9294 ticks RemoveLast from LinkedList... Time used: 4414 ticks

The code:

 using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Diagnostics; namespace Benchmarking { static class Collections { public static void run() { Random rand = new Random(); Stopwatch sw = new Stopwatch(); Stack<int> stack = new Stack<int>(); Queue<int> queue = new Queue<int>(); List<int> list1 = new List<int>(); List<int> list2 = new List<int>(); LinkedList<int> linkedlist1 = new LinkedList<int>(); LinkedList<int> linkedlist2 = new LinkedList<int>(); int dummy; sw.Reset(); Console.Write("{0,40}", "Pushing to Stack..."); sw.Start(); for (int i = 0; i < 100000; i++) { stack.Push(rand.Next()); } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "Poping from Stack..."); sw.Start(); for (int i = 0; i < 100000; i++) { dummy = stack.Pop(); dummy++; } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "Enqueue to Queue..."); sw.Start(); for (int i = 0; i < 100000; i++) { queue.Enqueue(rand.Next()); } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "Dequeue from Queue..."); sw.Start(); for (int i = 0; i < 100000; i++) { dummy = queue.Dequeue(); dummy++; } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "Insert to List at the front..."); sw.Start(); for (int i = 0; i < 100000; i++) { list1.Insert(0, rand.Next()); } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "RemoveAt from List at the front..."); sw.Start(); for (int i = 0; i < 100000; i++) { dummy = list1[0]; list1.RemoveAt(0); dummy++; } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "Add to List at the end..."); sw.Start(); for (int i = 0; i < 100000; i++) { list2.Add(rand.Next()); } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "RemoveAt from List at the end..."); sw.Start(); for (int i = 0; i < 100000; i++) { dummy = list2[list2.Count - 1]; list2.RemoveAt(list2.Count - 1); dummy++; } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "AddFirst to LinkedList..."); sw.Start(); for (int i = 0; i < 100000; i++) { linkedlist1.AddFirst(rand.Next()); } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "RemoveFirst from LinkedList..."); sw.Start(); for (int i = 0; i < 100000; i++) { dummy = linkedlist1.First.Value; linkedlist1.RemoveFirst(); dummy++; } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "AddLast to LinkedList..."); sw.Start(); for (int i = 0; i < 100000; i++) { linkedlist2.AddLast(rand.Next()); } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks); sw.Reset(); Console.Write("{0,40}", "RemoveLast from LinkedList..."); sw.Start(); for (int i = 0; i < 100000; i++) { dummy = linkedlist2.Last.Value; linkedlist2.RemoveLast(); dummy++; } sw.Stop(); Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks); } } }

The differences are so dramatic!

As you can see, the performance of Stack<T> and Queue<T> is fast and comparable, as expected.

For List<T> , the use of front and end has so many differences! And, to my surprise, the performance of adding / removing from the end is actually comparable to the performance of Stack<T> .

For LinkedList<T> , edge manipulation is fast (-er than List<T> ) ~~, but for the end it's incredibly slow to delete~~ when processing ends too.

So ... can any experts consider:

similarities in performance using Stack<T> and the end of List<T> ,
differences in the use of the front and end of List<T> and
Is the reason that the use of the end of LinkedList<T> so slow (not applicable as it is a coding error due to the use of Linq Last() , thanks to CodesInChaos )?

I think I know why List<T> doesn’t do so well with the front ... because List<T> has to move the whole list back and forth when this is done. Correct me if I am wrong.

PS My System.Diagnostics.Stopwatch.Frequency is 2435947 , and the program is for the .NET client profile and compiled with C # 4.0 in Windows 7 Visual Studio 2010.

+46

collections c # .net data-structures

Alvin Wong Nov 03

source share

5 answers

List<T> is a dynamic redirection array (a data structure that you will also see in the standard library of many other languages). This means that it internally uses a "static" array (an array that cannot be modified, known as an "array" in .NET), which can be often larger than the size of the list. Then adding simply increments the counter and uses the next, previously unused slot in the internal array. The array is redistributed (copying of all elements is required) if the internal array becomes small to accommodate all elements. When this happens, the size of the array increases by factors (not constants), usually 2.

This ensures that the average time complexity (basically, the average time per operation over a long sequence of operations) to add is O (1) even in the worst case. Such an optimization is impossible for adding to the front (at least, while not preserving both random access and adding O (1) at the end). You always need to copy all the elements in order to move them to their new slots (creating space for the added element in the first slot). Stack<T> does the same thing , you just don’t notice inconsistencies with the addition of the edge, because you only ever work on one end (fast).

~~Getting the end of a linked list depends a lot on the internal components of your list.~~ ~~You can maintain a link to the last element, but this makes all operations on the list more complicated and can (I have no example at hand) make some operations much more expensive.~~ ~~The absence of such a link, adding to the end requires going through all the elements of the linked list to find the last node, which, of course, is terribly slow for lists of non-trivial size.~~

As @CodesInChaos pointed out, your associated list manipulation was erroneous. A quick search for the end that you see now is most likely due to the fact that LinkedList<T> explicitly supports a link to the last node, as mentioned above. Note that getting the item from both ends is still slow.

+13

delnan Nov 03 '12 at 17:06

source share

The speed mainly depends on the number of operations required to insert, delete or search for an item. You have already noticed that this list requires memory transfers.

Stack is a list available only in the top element - and the computer always knows where it is.

A linked list is another one: the beginning of the list is known, so it’s very quick to add or remove from the very beginning, but finding the last item takes time. Caching the location of the last OTOH element is only useful for adding. To delete, you need to go through the full list minus one element to find the “hook” or pointer to the last one.

Just by looking at the numbers, you can make some reasonable assumptions about the internal components of each data structure:

Pop out of the stack quickly, as expected
push to stack is slower. and this is slower than adding to the end of the list. What for?
- Apparently, the size of the placement unit for the stack is smaller - it can only increase the size of the stack by 100, and the list can be increased in units of 1000.
The list seems to be a static array. To access the list on the front panel, data transfer is required, which takes time in proportion to the length of the list.
Operations with basic linked lists should not take much longer, usually only
- new_item.next = list_start; list_start = new_item; // add
- list_start = list_start.next; // delete
- however, since addLast is so fast, this means that when you add or remove in the linked list, you must also update the pointer to the last item. Thus, there is additional bookkeeping.
OTOH bidirectional lists make it relatively quick to insert and delete at both ends of the list (I was informed that the best code uses DLLs), however,
- references to the previous and next elements also double the job of accounting.

+5

Aki Suihkonen Nov 03 '12 at 17:03

source share

I have a Java background, and I think your question is more about general data structures than a specific language. Also, I apologize if my statements are incorrect.

1. similarities in the effectiveness of using Stack and the end of the list

2. differences in the use of the front and the end of the list, and

At least in Java, stacks are implemented using arrays (sorry if this is not the case with C #. You can refer to the source for implementation). The same goes for lists. Typical with an array, all inserts at the end take less time than at the beginning, because pre-existing values in the array must be moved down to place the insert at the beginning.

Link to the source Stack.java and its superclass Vector

3. Is the reason that using LinkedList end is so slow?

LinkedList does not allow random access and must go through nodes until it reaches the insertion point. If you find that performance for the last nodes is slower, then I assume that the LinkedList implementation should be a singly linked list . I think you would like to consider a doubly linked list for optimal performance when accessing items at the end.

http://en.wikipedia.org/wiki/Linked_list

+1

Arun Manivannan Nov 03 '12 at 17:03

source share

similarities in performance using Stack and end of list,

As delnan explains, they both use a simple array inside, so they behave very similar when working at the end. You can see that the stack is a list with only access to the last object.

differences in the use of the front and the end of the list

You already suspected that it was right. Manipulating the beginning of a list means that the underlying array must change. Adding an element usually means that you need to move all the other elements to one, the same with the deletion. If you know that you will be manipulating both ends of the list, it is better to use a linked list.

Is the reason that using the LinkedList end is so slow?

Typically, inserting and deleting items for linked lists at any position can be performed at a constant time, since you just need to change no more than two pointers. The only problem is to get to the position. A regular linked list has only a pointer to its first element. Therefore, if you want to go to the last element, you need to iterate over all the elements. A queue implemented with a linked list usually solves this problem by having an extra pointer to the last item, so adding items is possible in constant time. A more complex data structure would be a double linked list that has both pointers to the first and last element, and where each element also contains a pointer to the next and previous element.

What you need to know about this is that there are many different data structures created for one purpose that they can handle very efficiently. Choosing the right structure depends on what you want to do.

+1

poke Nov 03

source share

CodesInChaos · Accepted Answer · 2012-11-03 17:11

Relatively 1:

Stack<T> and List<T> performance similar is not surprising. I expect both of them to use double strategy arrays. This leads to depreciation of additives with a constant time.

You can use List<T> wherever you can use Stack<T> , but this leads to less expressive code .

Relatively 2:

I think I know why List<T> doesn’t do so well with the front ... because List<T> needs to move the whole list backward and at the same time.

It is right. Inserting / deleting elements at the beginning is expensive because it moves all the elements. On the other hand, getting or replacing items at the beginning is cheap.

Relatively 3:

Your slow LinkedList<T>.RemoveLast is a bug in your benchmarking code.

Removing or retrieving the last element of a doubly linked list is cheap. In the case of LinkedList<T> this means that RemoveLast and Last are cheap.

But you did not use the Last property, but the LINQ Last() extension method. In collections that do not implement IList<T> , iterates over the entire list, giving it O(n) runtime.

Differences in performance ... so dramatic?

More articles: