I use MALLET to analyze topics that output results to text files ("themes.txt") of several thousand lines and one hundred lines, where each line consists of variables separated by tabs:
Num1 text1 topic1 proportion1 topic2 proportion2 topic3 proportion3, etc. Num2 text2 topic1 proportion1 topic2 proportion2 topic3 proportion3, etc. Num3 text3 topic1 proportion1 topic2 proportion2 topic3 proportion3, etc.
Here is a snippet of evidence:
> dat[1:5,1:10] V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 0 10.txt 27 0.4560785 23 0.3040853 20 0.1315621 21 0.03632624 2 1 1001.txt 20 0.2660085 12 0.2099153 8 0.1699586 13 0.16922928 3 2 1002.txt 16 0.3341721 2 0.1747023 10 0.1360454 12 0.07507119 4 3 1003.txt 12 0.5366148 8 0.2255179 18 0.1388561 0 0.01867091 5 4 1005.txt 16 0.2363206 0 0.2214441 24 0.1914769 7 0.17760521
I am trying to use R to convert this output to a data table, where topics are column headers, and each of them contains the values of the "share" variable right on the right side of each variable is "theme" for each value of "text". Like this:
topic1 topic2 topic3 text1 proportion1 proportion2 proportion3 text2 proportion1 proportion2 proportion3
or with the data fragment above, for example:
0 2 7 8 10 12 13 16 18 20 21 23 24 27 10.txt 0 0 0 0 0 0 0 0 0 0.1315621 0.03632624 0.3040853 0 0.4560785 1001.txt 0 0 0 0.1699586 0 0.2099153 0.1692292 0 0 0.2660085 0 0 0 0 1002.txt 0 0.1747023 0 0 0.1360454 0.0750711 0 0.3341721 0 0 0 0 0 0 1003.txt 0.0186709 0 0 0.2255179 0 0.5366148 0 0 0.138856 0 0 0 0 0 1005.txt 0.2214441 0 0.1776052 0 0 0 0 0.2363206 0 0 0 0 0.1914769 0
This is the R code that I have to execute, sent from a friend, but it does not work for me (and I don’t know enough about it to fix it myself):
dat<-read.table("topics.txt", header=F, sep="\t") datnames<-subset(dat, select=2) dat2<-subset(dat, select=3:length(dat)) y <- data.frame(topic=character(0),proportion=character(0),text=character(0)) for(i in seq(1, length(dat2), 2)){ z<-i+1 x<-dat2[,i:z] x<-cbind(x, datnames) colnames(x)<-c("topic","proportion", "text") y<-rbind(y, x) }
I would really appreciate any suggestions on how I can get this code to work. My problem may be related to this and possibly this , but I still do not have the skills to immediately use the answers to these questions.