Skip list for small datasets

I'm quite interested in using the skip list for the Open list for A *. However, what bothers me is the probabilistic nature of this. Open lists can range from very small sets to hugh node numbers and should support performance for everyone.

As I understand it, Skip lists have a higher chance of accidentally providing poor results for small datasets. I think this can be a problem with a lot of shortcuts generated.

I was thinking of fixing it, why not track random numbers to a certain extent. Keep the total number of nodes at each level and to maintain a perfect distribution of nodes between each level, sometimes intervene and force the node to be a specific level.

I'm not sure how much this will work in my application, and maybe I should focus on a different data structure for my open lists.

All the articles that I read on the skip lists did not mention such optimization. Since I'm fairly new to the entire performance profiling game, I hesitate to try and improve the documented data structure.

+4
source share
3 answers
  • Yes, you are right - skip lists have a higher chance of decomposition when talking about small skip lists.
    In general, according to this article , there exists an alpha constant such that the probability of exceeding the skip list alpha * n is less than 1/2 ฮฉ(sqrt(n)) - as the skip lists become larger, the probability of this (exceeding this limit) is becoming less and less.

  • To avoid the worst cases of the skip list, you can use the original skip list option, a deterministic skip list . This issue is discussed in this dissertation.

  • Other alternatives are balanced BSTs such as AVL and red-black tree or even B + trees (which are usually preferred for file systems).

  • If your "open set" is really so small - the likelihood that your branching factor is also very small (close to 1), or more precisely, B^d (d = depth of solution, B = branching factor) will also be small. This will lead to a quick search , regardless of the implementation of the skip list , since it is expected that several nodes will be required.

+1
source

I would suggest that you create a program that randomly generates lists of skip lengths that you expect from your A * algorithm. Check these lists to see how many are less than optimal.

I would not recommend trying to create a data structure for missing production lists in which you propose monitoring. You will probably find that the monitoring and intervention code will cause the skip list to work poorly in the general case.

+2
source

When you say, โ€œMissing lists have a higher chance of accidentally providing poor results for small datasets,โ€ what exactly are you afraid of?

What you should not be afraid of is that for a list of 10 items there are not enough level 2 or 3 nodes to speed up the crawl. The difference between iterating a linked list (which is what knocks down a skiplist without level 2+ nodes) of 2 elements or 10 elements, in principle, does not exist even in hard loops (the type of reference administration node required by your data structure will probably have greater influence).

Obviously, as soon as you move on to more elements, the effect of the absence of nodes of a higher level increases. But, fortunately, the likelihood of getting a higher level node also increases. For example, if you add 8 elements, the probability that they will all be level 1 node is 0.5 ^ 8 = 0.39%, that is, it is extremely unlikely.

0
source

Source: https://habr.com/ru/post/1441702/


All Articles