How is ladle sorting considered in linear sorting?

I would like to study my analysis regarding bucket sorting as shown below.
There are many ways you can sort by Bucket. Some of them are as follows.
Type 1:
if we know the range of our items to be sorted, we can set buckets for each possible item, and just drop the items into their respective buckets. We then empty the buckets in order, and the result is a sorted list. When implementing this algorithm, we can easily use an array to represent our buckets, where the value in each index of the array will be the number of elements in the corresponding bucket. If we have integers in the range [0..max], then we will create an array of (max + 1) integers and initialize all the values ​​to zero. Then we sequentially through an unsorted array, reading the value of each element, we go to the corresponding index in the array of buckets, and the increment of the value there.

Time: O (N)
Space: O (1)
Type 2:

Example: sorting an array of people by age
The age is slightly different from arbitrary integers for sorting. Because of this, it has a small range [0-150] (all people aged 0 to 150). Thus, the fastest way to sort is to select 151 linked lists (let them be called buckets) and put each user data structure in a bucket depending on his age:

Time: O (N + K)
Space: O (N + K)

Type 3 (Change type2 as shown in Wikepedia )

The nextSort function is a sort function to sort each bucket. if sort sorting is used, the worse will be O (n ^ 2) or Merge sorting will be used so that I can maintain stability than O (nlgn).

  • Question:
    1> What is considered linear sorting, is it because of Type 1 or Type 2?
    2> If I use Type 3, like WIkepedia, which sorts efficiently sort each bucket?
    I know that reason insertion sorting is used in practice, so we expect the buckets to be small, and for small lists, insertion sorting is much faster than anything else. Even when implementing merge sort or quick sort, insertion sort is used when the list gets small enough (say, below 20 items or so).
    3> for type 3, based on which can I determine the range of buckets?
    This is important because if you try to sort in the form of a bucket with a large number of buckets, for example, much larger than n, the runtime may indicate the time required to scan all the buckets that are looking for the buckets you are using, even if most of them are empty.

I did an analysis based on:
Wikepedia
How can the complexity of sorting buckets O (n + k)?
Development and analysis of algorithms Lecture notes for January 23, 1996
http://www1bpt.bridgeport.edu/~dichter/lilly/bucketsort.htm
http://cs.nyu.edu/courses/fall02/V22.0310-002/lectures/lecture-23.html
How complicated is the sorting of the O (n + k) bucket if we use buckets using linked lists?
What is the worst difficulty sorting a bucket?

+4
source share
2 answers

Type 1:
The first type that you describe is not actually sorted in the form of a bucket. It actually counts the count of sort or key index. Although he was considering the option of sorting the bucket. The reason is because you are actually just counting the occurrences of each key, rather than storing the keys themselves in buckets.

Link: http://en.wikipedia.org/wiki/Counting_sort
Link: http://www.cs.princeton.edu/courses/archive/spr13/cos226/demo/51DemoKeyIndexedCounting.pdf

Space: O (1)
we can customize buckets for every possible item,

Isn't that contrary? Are you going to declare buckets for every possible element and still hold O (1) ?;)

If you want the algorithm to be stable, you also cannot overwrite the input array. Therefore, in practice, you need the space n + k for:

  • An output array of length 'n' (the same size as your input array)
  • 'k' buckets

If you check the pseudocode to count the sort, you will notice that the last loop goes through the input array again to see where each element should go. By doing this in the order that they appear in the input array, you get a stable look.

PS: Keep in mind that you don't necessarily sort integers. If the input is an array of characters between AZ, you can also use this algorithm.

Type 2:

Thus, the fastest way to sort would be to select 151 linked lists (let's call them buckets) and put the data structure of each person in a bucket depending on his age:

This may be the easiest way, because you can find the bucket you need quite simply, but it is not necessarily the fastest way;). Another possibility, for example, is to create baskets every 10 years.

00 - 09
10 - 19
20 - 29
...

And when you want to insert something into a bucket, you can do:

  • Binary search in the bucket (e.g. LinkedList) to find the desired position.
  • Insert item

Thus, you also do not need to sort the buckets after that, because everything is already sorted. Not to say that this is a good idea, just pointing to this opportunity .;)

Questions:
1) Simply put; This is linear sorting because if sorting requires linear time. Both type 1 and type 2 accept O (n + k). Since ladle sorting does not use comparisons between elements like quicksort, bubblesort, ... is not tied to O (n log n). Keep in mind that O-notation does not give any guarantee of speed; it gives a guarantee of growth rate. If your input size doubles from "N" to "2N", your linear time algorithm will do better with it than, for example, an O (n ^ 2) algorithm similar to sorting bubbles .;)

2) Insertion sorting is really effective for small arrays and basically the reason it is selected. + the fact that it is stable. Because if you do not use a stable algorithm to sort the buckets themselves, the whole algorithm (sorting the bucket) will not be stable.

3) It's hard to say. It depends on the data, in my opinion. If you need to sort 1 million 32-bit integers, you are not going to create 2 ^ 32 buckets for them. In this case, it would be nice to look at other algorithms (for example, sorting LSD Radix), which would basically generate 9 codes (1 for each digit).

+3
source

Bucket sorting is linear time when buckets are sorted by linear time. β€œType 1” and β€œType 2” are both linear, because all values ​​in each bucket are compared in pairs equal and do not need further sorting.

The answer to your last two questions is what works in practice. Typically, the creator of the sorting standard library has defined the appropriate restriction for sorting the insert. I believe that bucket sorting performance is heavily dependent on data and the memory subsystem.

+1
source

Source: https://habr.com/ru/post/1482378/


All Articles