How to find a dense area in 1d

I have a set of points in 1d with one area that is much denser. Is there a suitable method in scikit-learn (or any other library) to find this dense region? It seems like this should be a clustering problem with the number of clusters installed, but it should also be noise resistant. Or maybe this is a detection detection problem? Here is a histogram of the type of data I'm talking about.

enter image description here

I cannot load real data, but here is a simple simulation:

import random
import matplotlib.pyplot as plt

N = 100

start = 0
points = []
rate = 0.1
for i in range(N):
    points.append(start)
    start = start + random.expovariate(rate)
rate = 10
for i in range(N*10):
    points.append(start)
    start = start + random.expovariate(rate)
rate = 0.1
for i in range(N):
    points.append(start)
    start = start + random.expovariate(rate)
plt.hist(points, bins = 100)
plt.show()
+4
source share
1 answer

, .. , . : - [a, b] . [c, d], .

scipy.stats , , nlf, log . - - pdf . pdf , 1/(d-c) + 1/(b-a) 1/(b-a) . , .

from numpy import np
from scipy.optimize import fmin
points = np.array(points)           # should be a numpy array
a, b = points.min(), points.max()
def nlf(params):
    c, d = params
    within = ((points > c) & (points < d)).sum()
    return -np.log(1/(d-c) + 1/(b-a))*within - np.log(1/(b-a))*(len(points) - within)
res = fmin(nlf, (0.9*a + 0.1*b, 0.1*a + 0.9*b), disp=0)

(res) - [1046.32119001, 1149.31175184] ( ). .

c, d , [a, b] , . (0.9*a + 0.1*b, 0.1*a + 0.9*b); .

+3

Source: https://habr.com/ru/post/1696119/


All Articles