Performance issues

I work with time series type TSeries = [(Day, Double)] , but you need to translate the first Day element to Double for further processing (for example, charts, etc.).

Matching a date range with the corresponding double range [lobound, upbound], where the earliest date map is for binding and the last date map for ascending, is the main transformation. To implement it, I first need to get the minimum and maximum date range. I ran into a performance issue, but I don’t know exactly why and how to fix it.

Here is the code (the time series is not considered sorted):

 module Main where import Data.Time (Day, fromGregorian, diffDays) type TSeries = [(Day, Double)] -- time-series to (Double, Double) mapping function toDbl :: (Day -> Double) -> TSeries -> [(Double, Double)] toDbl mapX ts = map (\(d,x) -> (mapX d, x)) ts -- Day to Double mapping function - fast mapDays1 :: (Day, Double) -> (Day, Double) -> Day -> Double mapDays1 (d0,x0) (d1,x1) d = ((fromIntegral $ diffDays d d0) * x1 + (fromIntegral $ diffDays d1 d) * x0) / diff10 where diff10 = fromIntegral $ diffDays d1 d0 -- Day to Double mapping function - slow mapDays2 :: TSeries -> Double -> Double -> Day -> Double mapDays2 ts x0 x1 d = mapDays1 (d0,x0) (d1,x1) d where d0 = minimum $ map fst ts d1 = maximum $ map fst ts -- example time-series func :: Int -> Double func d = sin $ pi / 14 * (fromIntegral d) ts = [(fromGregorian ymd, func d) | y <- [2000..2016], m <- [1..12], d <- [1..28]] :: TSeries -- speed test main = do let mindate = minimum $ map fst ts maxdate = maximum $ map fst ts test1 = toDbl (mapDays1 (mindate,0.0) (maxdate,100.0)) ts test2 = toDbl (mapDays2 ts 0.0 100.0) ts -- print $ sum $ map fst test1 -- this is fast print $ sum $ map fst test2 -- this is slow 

The test that I am doing (summing the X axis, first of all the elements) does not matter, but it is simple and well illustrates the performance problem.

In essence, mapDays1 and mapDays2 are the same, except that in order to get the correct scaling, I need to calculate the minimum and maximum dates from the outside and pass them to mapDays1, while this is done β€œinside” in mapDays2.

The problem is that mapDays2 is very slow compared to the version of mapDays1. I suspect that the minimum and maximum calculations are called many times (as opposed to once), but I do not understand why, and I'm not sure how to fix mapDays2 to get a performance similar to mapDays1.

+5
source share
1 answer

The problem is really related to the memories. The problem is that you call mapDays1 and mapDays2 without passing all your arguments to them, so these calls only create tricks.

Problem

This means that thunks only ends inside the map , so different calls to mapDays2 cannot share their results for d0 = minimum $ map fst ts and d1 = maximum $ map fst ts and maximum and minimum values, evaluated each time . One can imagine a situation in which d0 and d1 , depending on the last argument of Day , in which case it would be wrong not to revise d0 and d1 each time.

In contrast, it should be very clear that mindate = minimum $ map fst ts and maxdate = maximum $ map fst ts need to be calculated only once.

Fixing mapDays2

Although we like to pretend that fxy = e matches fx = \y -> e , it is not under the hood. You want the GHC to not thunk when passing all but the last argument. Just move the d sign to the equal sign. Then the function you return will be evaluated only once d0 and d1 :

 -- Day to Double mapping function - slow mapDays2 :: TSeries -> Double -> Double -> Day -> Double mapDays2 ts x0 x1 = \d -> mapDays1 (d0,x0) (d1,x1) d where d0 = minimum $ map fst ts d1 = maximum $ map fst ts 
+4
source

Source: https://habr.com/ru/post/1260502/


All Articles