Haskell to find the distance between two nearest points

Given a list of points in two-dimensional space, you want to execute a function in Haskell to find the distance between the two nearest points. example: Input: project [(1,5), (3,4), (2,8), (-1,2), (-8,6), (7,0), (1,5), (5.5), (4.8), (7.4)] Output: 2.0

Suppose that the distance between the two farthest points in the list does not exceed 10,000.

Here is my code:

import Data.List import System.Random sort_ :: Ord a => [a] -> [a] sort_ [] = [] sort_ [x] = [x] sort_ xs = merge (sort_ left) (sort_ right) where (left, right) = splitAt (length xs `div` 2) xs merge [] xs = xs merge xs [] = xs merge (x:xs) (y:ys)= if x <= y then x : merge xs (y:ys) else y : merge (x:xs) ys project :: [(Float,Float)] -> Float project [] = 0 project (x:xs)= if null (xs) then error "The list have only 1 point" else head(sort_(dstList(x:xs))) distance :: (Float,Float)->(Float,Float) -> Float distance (x1,y1) (x2,y2) = sqrt((x1 - x2)^2 + (y1 - y2)^2) dstList :: [(Float,Float)] -> [Float] dstList (x:xs)= if length xs == 1 then (dstBetween x xs):[] else (dstBetween x xs):(dstList xs) dstBetween :: (Float,Float) -> [(Float,Float)] -> Float dstBetween pnt (x:xs)= if null (xs) then distance pnt x else minimum ((distance pnt ):((dstBetween pnt xs)):[]) {- Calling generator to create a file created at random points -} generator = do putStrLn "Enter File Name" file <- getLine g <- newStdGen let pts = take 1000 . unfoldr (Just . (\([a,b],c)->((a,b),c)) . splitAt 2) $ randomRs(-1,1) g :: [(Float,Float)] writeFile file . show $ pts {- Call the main to read a file and pass it to the function of project The function of the project should keep the name 'project' as described in the statement -} main= do putStrLn "Enter filename to read" name <- getLine file <- readFile name putStrLn . show . project $ readA file readA::String->[(Float,Float)] readA = read 

I can run the program, as in the example, or using the generator as follows:

the haskell should type โ€œgeneratorโ€ in the interpreter, the program will request here a file name containing a thousand points. and after the file is generated in the Haskell interpreter, it should write main and ask for the name of the file, which is the name of the file created with the "generator".

The problem is that for 1000 points randomly created by my program, it takes a lot of time, about 3 minutes, on a computer with a dual-core processor. What am I doing wrong? How can I optimize my code to work faster?

+4
source share
2 answers

You are using a quadratic algorithm:

 project [] = error "Empty list of points" project [_] = error "Single point is given" project ps = go 10000 ps where go a [_] = a go a (p:ps) = let a2 = min a $ minimum [distance pq | q<-ps] in a2 `seq` go a2 ps 

You should use the best algorithm. Find a label for computational geometry on SO for a better algorithm.

See also http://en.wikipedia.org/wiki/Closest_pair_of_points_problem .


@maxtaldykin offers a nice, simple and effective algorithm change that should have a real difference for random data - pre-sort the points by the X coordinate, and never try more accurately than d units from the current point, in the X coordinate (where d is the known in present minimum distance):

 import Data.Ord (comparing) import Data.List (sortBy) project2 ps@ (_:_:_) = go 10000 p1 t where (p1:t) = sortBy (comparing fst) ps go d _ [] = d go d p1@ (x1,_) t = g2 dt where g2 d [] = go d (head t) (tail t) g2 d ( p2@ (x2,_):r) | x2-x1 >= d = go d (head t) (tail t) | d2 >= d = g2 dr | otherwise = g2 d2 r -- change it "mid-flight" where d2 = distance p1 p2 

In random data, g2 will work in O(1) time, so go will take O(n) , and all this will be limited to sorting ~ n log n .

Empirical growth orders show ~ n^2.1 for the first code (in the 1k / 2k range) and ~n^1.1 for the second, in the 10k / 20k range, testing it quick'n'dirty compiled in GHCi (the second code works 50 times faster than the first for 2000 points, and 80 times faster by 3000 points).

+11
source

Perhaps you can tweak your bruteforce search a bit to get better random data performance.

The main idea is to sort the points by the x coordinate and, comparing the distances in the loop, consider only points with a horizontal distance not exceeding the current minimum distance.

It may be an order of magnitude faster, but in the worst case, it is still O (n ^ 2).
In fact, 2000 points is 50 times faster on my car.

 project points = loop1 10000 byX where -- sort points by x coordinate -- (you need import Data.Ord to use `comparing`) byX = sortBy (comparing fst) points -- loop through all points from left to right -- threading `d` through iterations as a minimum distance so far loop1 d = foldl' loop2 d . tails -- `tail` drops leftmost points one by one so `x` is moving from left to right -- and `xs` contains all points to the right of `x` loop2 d [] = d loop2 d (x:xs) = let -- we take only those points of `xs` whose horizontal distance -- is not greater than current minimum distance xs' = takeWhile ((<=d) . distanceX x) xs distanceX (a,_) (b,_) = b - a -- then just get minimum distance from `x` to those `xs'` in minimum $ d : map (distance x) xs' 

Btw, please do not use so many parentheses. Haskell does not require the application of function arguments.

+6
source

Source: https://habr.com/ru/post/1446601/


All Articles