Frequency of counting each item in the list

I am trying to write a program that will count the frequency of each item in a list.

In: "aabbcabb" Out: [("a",3),("b",4),("c",1)] 

You can view my code at the following link: http://codepad.org/nyIECIT2 In this code, the output of a unique function will be like

  In: "aabbcabb" Out: "abc" 

Using the unique output, we will consider the frequency of the target list. You can also see the code:

  frequencyOfElt xs=ans where ans=countElt(unique xs) xs unique []=[] unique xs=(head xs):(unique (filter((/=)(head xs))xs)) countElt ref target=ans' where ans'=zip ref lengths lengths=map length $ zipWith($)(map[(=='a'),(==',b'),(==',c')](filter.(==))ref)(repeat target) Error:Syntax error in input (unexpected symbol "unique") 

But in ghci 6.13 another type of error also shows

Few people asked me why to use [(== 'a'), (== ', b'), (== ', c')]. What I expect: if ref = "abc" and target = "aabbaacc" then

  zipWith($) (map filter ref)(repeat target) 

will show ["aaaa", "bb", "cc"], then I can use the length of the map above this to get the frequency. Here, to filter the list according to ref, I use [(== 'a'), (== ' , b '), (==', c ')]

I assume that some logical error lies [(== 'a'), (== ', b'), (== ', c')] here ..

+4
source share
2 answers

You did not say whether you want to write it in its entirety yourself, or OK to compose it from some standard functions.

 import Data.List gs = map (\x -> ([head x], length x)) . group . sort $ s -- g = map (head &&& length) . group . sort -- without the [...] 

is the standard fast n-dirty way to encode it.


OK, so your original idea was Code it Point-Free Style (a specific tune playing in my head ...):

 frequencyOfElt :: (Eq a) => [a] -> [(a,Int)] frequencyOfElt xs = countElt (unique xs) xs -- change the result type where unique [] = [] unique (x:xs) = x : unique (filter (/= x) xs) countElt ref target = -- Code it Point-Free Style (your original idea) zip ref $ -- your original type would need (map (:[]) ref) here map length $ zipWith ($) -- ((filter . (==)) c) === (filter (== c)) (zipWith ($) (repeat (filter . (==))) ref) (repeat target) 

I changed the type here to a more reasonable [a] -> [(a,Int)] btw. note that

 zipWith ($) fs (repeat z) === map ($ z) fs zipWith ($) (repeat f) zs === map (f $) zs === map f zs 

therefore, the code is simplified to

  countElt ref target = zip ref $ map length $ map ($ target) (zipWith ($) (repeat (filter . (==))) ref) 

and then

  countElt ref target = zip ref $ map length $ map ($ target) $ map (filter . (==)) ref 

but map f $ map g xs === map (fg) xs , therefore

  countElt ref target = zip ref $ map (length . ($ target) . filter . (==)) ref -- (1) 

which is a little clear (to my liking), written with a list,

  countElt ref target = [ (c, (length . ($ target) . filter . (==)) c) | c <- ref] == [ (c, length ( ($ target) ( filter (== c)))) | c <- ref] == [ (c, length $ filter (== c) target) | c <- ref] 

This gives us the idea to rewrite (1) further as

  countElt ref target = zip <*> map (length . (`filter` target) . (==)) $ ref 

but this obsession with meaningless code becomes meaningless.


Therefore, returning to understandable readable lists using the standard nub function, which is equivalent to your unique , your idea becomes

 import Data.List frequencyOfElt xs = [ (c, length $ filter (== c) xs) | c <- nub xs] 

This algorithm is actually quadratic ( ~ n^2 ), therefore it is worse than the first version, sort is dominated over i.e. linearly limit ( ~ n log(n) ).


This code, although it can be manipulated further on the principle of equivalent transformations:

  = [ (c, length . filter (== c) $ sort xs) | c <- nub xs] 

... because the search in the list matches the search in the list sorted. Doing more work here will pay off? ..

  = [ (c, length . filter (== c) $ sort xs) | (c:_) <- group $ sort xs] 

... right? But now group has already grouped them with (==) , so there is no need to call filter repeat the work already done with group :

  = [ (c, length . get c . group $ sort xs) | (c:_) <- group $ sort xs] where get c gs = fromJust . find ((== c).head) $ gs = [ (c, length g) | g@ (c:_) <- group $ sort xs] = [ (head g, length g) | g <- group (sort xs)] = (map (head &&& length) . group . sort) xs 

is not it? And now this is the same linear algorithm from the beginning of this message, actually obtained from your code, decomposing its hidden general calculations, making them available for reuse and simplification of the code.

+14
source

Using multiset-0.1 :

 import Data.Multiset freq = toOccurList . fromList 
+8
source

Source: https://habr.com/ru/post/1447779/


All Articles