Data Structure / Partition Search Algorithm

What data structure / algorithm should be used to find which section a given section is in, given the list of endpoints for each section?

For example, if I have a web page with section headings and content,

  • Introduction (ends at 100 pixels)
  • Section 1 (ends with 350px)
  • Section 2 (ends at 700 pixels)
  • Conclusion (ends at 1200 pixels)
  • comments

and I'm at 130px now, it should be back, which I’m now in the "Section 1" section.

Option 1

Binary search on an array of endpoints

from bisect import bisect_left arr = [100, 350, 700, 1200] pos = bisect_left(arr, 130, 0, arr[-1]) 

However, it can still take O (log n) for each change of position.

Option 2

Search the hash table of the current location table,

 lookup = {0: "Introduction" 1: "Introduction" ... 10: "Section 1" 11: "Section 1" ... } section = lookup[130/10] 

It's fast, but it spends a lot of space


Is there a general data structure / algorithm that deals with this type of problem?

+5
source share
2 answers

I like your first option, binary searches are very effective for scanning, and as you say, the second option is not an effective space.

A traditional and very general solution that scales in computer graphics is a 2d k-tree , which creates a tree on which you can search for coordinates without losing memory. In particular, its complexity of searching, deleting and inserting is all O (log n), and its spatial complexity is O (n).

Given that you only make one axis, although a web page will have 1-100 sections (and it is unlikely to have thousands, not to mention millions or billions of sections), then I personally would think about going with a very simple array, and then move on to a more complex k-tree when there is a measurable benefit / need. If you write this in C or another language that gives you some control over the memory layout, then the array of structures will be scanned extremely quickly due to the design of modern processor and memory hierarchies (in particular, prefetching and caching).

+2
source

The simplest and most efficient way is to use binary search with O (LogN) complexity.

Your second option has better O (1) complexity, but it has a lack of prepopulation. Presegmentation for binary search is easier.

Both of these approaches are best if you do not update sections at run time .

Data structures are needed if you need add / remove / update times. As you approach the required update of pre-populated data with O (N). This means that update / add / delete operations can take up to O (N).

0
source

Source: https://habr.com/ru/post/1234090/


All Articles