I am using the mclust library for R ( http://www.stat.washington.edu/mclust ) to do some experimental EM-based GMM clustering. The package is wonderful and seems to usually find very good clusters for my data.
The problem is that I donβt know R at all, and although I managed to confuse the clustering process based on the help contents () and the extensive readme, I canβt understand for life how to output the actual cluster results to a file. I use the following absurdly simple script to perform clustering,
myData <- read.csv("data.csv", sep=",", header=FALSE) attach(myData) myBIC <- mclustBIC(myData) mySummary <- summary( myBIC, data=myData )
at this moment i have cluster results and summary. The data in data.csv is just a list of multidimensional points, one per line. Therefore, each row looks like "x, y, z" (in the case of three dimensions).
If I use 2d points (for example, only x and y vals), I can then use the internal graph function to get a very beautiful graph that displays the source points and color codes of each point based on the cluster to which it was assigned. Therefore, I know that all the information is somewhere in "myBIC", but the documents and help do not seem to give any information on how to print this data!
I want to print a new file based on the results, which, it seems to me, are encoded in myBIC. Sort of,
CLUST x, y, z 1 1.2, 3.4, 5.2 1 1.2, 3.3, 5.2 2 5.5, 1.3, 1.3 3 7.1, 1.2, -1.0 3 7.2, 1.2, -1.1
and then - hopefully also print the parameters / centroids of the individual gaussians / cluster that detected the clustering process.
Of course, this is absurdly easy to do, and I don't know R too much to understand this ...
EDIT: It seems like I went a little further. Performing the next fingerprint is somewhat critical matrix,
> mySummary$classification [1] 1 1 2 1 3 [6] 1 1 1 3 1 [12] 1 2 1 3 1 [18] 1 3
which, after reflection, which I understood, is actually a list of samples and their classifications. I think it is impossible to write directly with the write command, but a little more experimentation in the R console made me realize that I can do this:
> newData <- mySummary$classification > write( newData, file="class.csv" )
and that the result really looks pretty good!
$ head class.csv "","x" "1",1 "2",2 "3",2
where the first column clearly corresponds to the index for the input, and the second column describes the assigned class identifier.
The mySummary $ parameters object appears to be nested, and has a bunch of sub-objects corresponding to individual gaussians and their parameters, etc. The "write" function does not work when I try to just write it, but individually writing each sub-object name is a bit tedious. This leads me to a new question: how can I iterate over a nested object in R and print elements in sequential order in a file descriptor?
I have this object mySummary $ parameters. It consists of several sub-elements, such as "mySummary $ parameters $ variance $ sigma", etc. I would just like to iterate over everything and print all the files in the same way as it does with the CLI automatically ...