Calculation of clustering and distance in Julia

I have a set of n coordinate points of the form (x, y, z). They are stored in nx 3 M.

Is there a built-in function in Julia to calculate the distance between each point and every other point? I work with a small number of points, so the calculation time is not too important.

My common goal is to start the clustering algorithm, so if there is a clustering algorithm that I can see, it does not require me to calculate these distances first, please suggest this too. Below is an example of the data that I would like to execute for clustering. Obviously, I needed to do this only for the z-coordinate.

Sample dataset I need to perform clustering on

+5
source share
2 answers

To calculate distances, use the Distances package.

Given the matrix X , you can calculate pairwise distances between columns. This means that you must specify your entry points (your n objects) in the matrix columns. (In your question, you mention the nx3 matrix, so you have to transpose it using the transpose() function.)

Here is an example of how to use it:

 >using Distances # install with Pkg.add("Distances") >x = rand(3,2) 3x2 Array{Float64,2}: 0.27436 0.589142 0.234363 0.728687 0.265896 0.455243 >pairwise(Euclidean(), x, x) 2x2 Array{Float64,2}: 0.0 0.615871 0.615871 0.0 

As you can see, the above returns a matrix of distances between columns of X You can use other distance indicators if you need, just check the documents for the package.

+7
source

Just for completeness, @ niczky12's answer in Julia has a package called Clustering , which, according to this name, allows clustering.

Example kmeans algorithm:

 >>> using Clustering # Pkg.add("Clustering") if not installed >>> X = rand(3, 100) # data, each column is a sample >>> k = 10 # number of clusters >>> r = kmeans(X, k) >>> fieldnames(r) 8-element Array{Symbol,1}: :centers :assignments :costs :counts :cweights :totalcost :iterations :converged 

The result is stored in the return kmeans ( r ), which contains the above fields. Two perhaps the most interesting fields: r.centers contains the centers detected by the kmeans algorithm, and r.assigments contains the cluster to which each of the 100 samples belongs.

There are several other clustering methods in one package. Feel free to dive into the documentation and apply the one that best suits your needs.


In your case, since your data is an N x 3 matrix, you only need to transpose it:

 M = rand(100, 3) kmeans(M', k) 
+5
source

Source: https://habr.com/ru/post/1246908/


All Articles