Julia DataFrame: Create a new sum of columns of col values: x by: y

I have DataFramefor x and y. I would like to calculate how often each event occurs in DataFrameand what percentage of occurrence :y, which is a combination. Now I have the first part, thanks to the previous question .

using DataFrames
mydf = DataFrame(y = rand('a':'h', 1000), x = rand('i':'p', 1000))
mydfsum = by(mydf, [:x, :y], df -> DataFrame(n = length(df[:x])))

This successfully creates a column that counts how often each value :xoccurs with each value :y. Now I need to create a new column that counts how often each value happens :y. I could create a new DataFrameone using:

mydfsumy = by(mydf, [:y], df -> DataFrame(ny = length(df[:x])))

Join DataFramestogether.

mydfsum = join(mydfsum, mydfsumy, on = :y)

And create a percentage column :yp

mydfsum[:yp] = mydfsum[:n] ./ mydfsum[:ny]

. R , dplyr:

mydf %>% groupby(x,y) %>% summarize(n = n()) %>% groupby(y) %>% mutate(yp = n/sum(n))
+4
1

:

mydfsum = by(mydf, :y, df -> by(df, :x, dd -> DataFrame(n = size(dd,1), yp = size(dd,1)/size(df,1))))

, , do :

mydfsum = by(mydf,:y) do df by(df, :x) do dd DataFrame(n = size(dd,1), yp = size(dd,1)/size(df,1)) end end

, R, by x y, . , . yp , by.

mydfsum = by(mydf,[:x,:y], df -> DataFrame(n = size(df,1), yp = 0.)) by(mydfsum, :y, df -> (df[:yp] = df[:n]/sum(df[:n])))

Query.jl

+2

Source: https://habr.com/ru/post/1677278/


All Articles