Reasonably Remove Points in Python List

Suppose I have two arrays representing the x and y coordinates of a calibration curve.

X = [1,2,3,4,5,6,7,8,9,10,12,14,16,18,20,30,40,50]
Y = [2,4,6,8,10,12,14,16,18,20,24,28,32,36,40,60,80,100]

In my examples, arrays are above 18 points. You will notice that the x values ​​are not linearly spaced; there are more points at lower x values.

Suppose I need to reduce the number of points in my calibration curve to 13 points. Obviously, I could just delete the first five or last five points, but this will reduce my overall range of x values. To maintain the range and minimize the space between the x values, I would prefer to remove the x = 2,4,6,8,10 values. Removing these x-points and their corresponding y values ​​would leave 13 points on the curve as needed.

How can I automatically select this item and delete it in Python? That is, is there an algorithm for choosing the best x points from the list, where the "best" is defined as keeping the points as close as possible, maintaining the overall range and adhering to the new number of points.

Please note that the remaining points must be in the source lists, so I cannot interpolate 18 points on a 13-point grid.

+4
source share
4 answers
X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 30, 40, 50]
Y = [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 24, 28, 32, 36, 40, 60, 80, 100]

assert len(X) == len(set(X)), "Duplicate X values found"

points = list(zip(X, Y))
points.sort()  # sorts by X

while len(points) > 13:
    # Find index whose neighbouring X values are closest together
    i = min(range(1, len(points) - 1), key=lambda p: points[p + 1][0] - points[p - 1][0])
    points.pop(i)

print(points)

Conclusion:

[(1, 2), (3, 6), (5, 10), (7, 14), (10, 20), (12, 24), (14, 28), (16, 32), (18, 36), (20, 40), (30, 60), (40, 80), (50, 100)]

If you want the original series again:

X, Y = zip(*points)
+1
source

This maximizes the square root distances between the selected points. In a way, this spreads the dots as far as possible.

import itertools
list(max(itertools.combinations(sorted(X), 13), i
         key=lambda l: sum((a - b) ** 2 for a, b in zip(l, l[1:]))))

, . k O(k * (len(X) choose k)), O(exp(len(X)). , , len(X) == 100 k == 10.

+3

, :

  • . , , MAX_INT. , 1 MAX_INT; 2 2, 10 3.
  • .
  • , 1.

2,4,6,8,10,3,...

0

, , :

def mostRedundantPoint(x):
    #returns the index, i, in the range 0 < i < len(x) - 1
    #that minimizes x[i+1] - x[i-1]
    #assumes len(x) > 2 and that x
    #is sorted in ascending order

    gaps = [x[i+1] - x[i-1] for i in range(1,len(x)-1)]
    i = gaps.index(min(gaps))
    return i+1

def reduceList(x,k):
    if len(x) <= k:
        return x
    else:
        i = mostRedundantPoint(x)
        return reduceList(x[:i]+x[i+1:],k)

X = [1,2,3,4,5,6,7,8,9,10,12,14,16,18,20,30,40,50]
print(reduceList(X,13))
#prints [1, 3, 5, 7, 10, 12, 14, 16, 18, 20, 30, 40, 50]

, 7 8 . , sorted([random.randint(1,10**6) for i in range(1000)]) 1000 100 . , , , , , , , , -, , . , , , .

0

Source: https://habr.com/ru/post/1673437/


All Articles