Julia: function optimization

I am trying to learn how to write good julia code. I would like to encode the following statistics.

(note 1 {A} = 1 if A is true, 0 if A is false)

enter image description here

Where

enter image description here

and

enter image description here

function cohens_kappa(x::Vector{Int}, k::Int)
  support = unique(x)
  m = length(support)
  n = length(x)
  y = BitArray(n, m)
  for j in eachindex(support)
    y[:,j] = (X .== support[j])
  end

  num = 0.0
  den = 0.0
  for j in eachindex(support)
    pjjk = sum(y[(1 + k):n, j] & y[1:(n - k), j]) / (n - k)
    pj = sum(y[:, j]) / n

    num += pjjk - pj ^ 2
    den += (1 / m) - pj ^ 2
  end
  return (num / den)
end

Is this the most efficient way to encode this?

EDIT: Thanks for all the guys suggestions. Can you explain why your code is more efficient? I would like to know how to keep writing good code in the future.

testing against @ user3580870 two examples we have

@time [cohens_kappa(X, k) for k in 1:15]
  0.000507 seconds (1.58 k allocations: 269.016 KB)

@time [cohens_kappa2(X, k) for k in 1:15]
  0.000336 seconds (166 allocations: 12.375 KB)

@time [cohens_kappa3(X, k) for k in 1:15]
  0.000734 seconds (303 allocations: 84.109 KB)

It seems your second sentence is not so fast, but it makes fewer distributions than my original version, so it can be faster for very large vectors.

+4
source share
3 answers

, :

   function cohens_kappa2(x::Vector{Int}, k::Int)
     d = Dict{Int,Int}()
     n = length(x)
     c1 = Int[]
     pnew = 0
     for i=1:n
       p = get(d,x[i],0)
       if p>0
         c1[p] += 1
       else
         pnew += 1
         d[x[i]] = pnew
         push!(c1,1)
       end
     end
     c2 = zeros(Int,pnew)
     for i=(k+1):n
       if x[i-k]==x[i] c2[d[x[i]]] += 1 ; end
     end
     num, dentmp = 0.0, 0.0
     for i=1:pnew
       pjjk = c2[i]/(n-k)
       pj = c1[i] / n
       num += pjjk - pj^2
       dentmp += pj^2
     end
     return (num / (1.0-dentmp))
   end

, , , .

5 10 . ?

+3

. , , :

function cohens_kappa_2(x::vector{Int},k:Int)

  ...

  # Autocorrelation dictionary
  dxx=Dict{Int,Int}()

  # k-step element-wise matches
  xx=(x[1:end-k])[x[1:end-k] .== x[1+k:end]]

  # Populate the dictionary
  for exx in xx
    dxx[exx] += 1 # Warning! pseudo-code
  end

  ...

end

, "" , k-, .

, , . , , , O (N * lnN).

+2

Adding another version. This is a bit slower than the longer version given in my other answer. On the other hand, it is much cleaner and uses the package DataStructures counter. In the code, it cccounts all the elements and their frequencies and cc2counts the kpairs of distances of the same elements. And the source:

using DataStructures     # install with Pkg.add("DataStructures")

function cohens_kappa3(x::Vector{Int}, k::Int)
      n = length(x)
      cc = counter(x)
      cc2 = counter(x[[i<=k ? false : x[i]==x[i-k] for i=1:n]])
      num, den = 0.0,1.0
      for (val,freq) in cc
          pj2 = (freq/n)^2
          num += cc2[val]/(n-k)-pj2
          den -= pj2
      end
      return num/den
end
+1
source

Source: https://habr.com/ru/post/1627232/


All Articles