Deedle: grouping time series in the top 3 and rest

I have a Deedle series with election data, for example:

   "Party A", 304
   "Party B", 25 
   "Party C", 570
   ....
   "Party Y", 2
   "Party Z", 258

I would like to create a new series like this:

   "Party C", 570
   "Party A", 304 
   "Party Z", 258
   "Others", 145

So, I want to take the top 3 as they are, and summarize all the others as a new line. What is the best way to do this?

+4
source share
1 answer

I don’t think we have anything in Deedle that would make it one liner (how disappointing ...). So I could think about getting the keys for the top three parties, and then use it Series.groupIntowith a key selector that returns the name of the party (for the top 3) or returns “Other” (for other parties)

// Sample data set with a bunch of parties
let election =
 [ "Party A", 304
   "Party B", 25 
   "Party C", 570
   "Party Y", 2
   "Party Z", 258 ]
 |> series

// Sort the data by -1 times the value (descending)
let byVotes = election |> Series.sortBy (~-)
// Create a set with top 3 keys (for efficient lookup)
let top3 = byVotes |> Series.take 3 |> Series.keys |> set

// Group the series using key selector that tries to find the party in top3
// and using an aggregation function that sums the values (for one or multiple values)
byVotes |> Series.groupInto 
    (fun k v -> if top3.Contains(k) then k else "Other")
    (fun k s -> s |> Series.mapValues float |> Stats.sum)
+5
source

Source: https://habr.com/ru/post/1568022/


All Articles