Several events in the training

Question

Several events in the training

I am trying to analyze multiple sequences using TraMineR at once. I looked at seqdef, but I'm struggling to figure out how to create a TraMineR dataset when dealing with multiple variables. I think I'm working with something similar to the dataset used by Aassve et al. (as mentioned in the textbook ), as a result of which each wave has information about several conditions (for example, children, marriage, employment). All my variables are binary. Here is an example dataset with three waves (D, W2, W3) and three variables.

D<-data.frame(ID=c(1:4),A1=c(1,1,1,0),B1=c(0,1,0,1),C1=c(0,0,0,1)) W2<-data.frame(A2=c(0,1,1,0),B2=c(1,1,0,1),C2=c(0,1,0,1)) W3<-data.frame(A3=c(0,1,1,0),B3=c(1,1,0,1),C3=c(0,1,0,1)) L<-data.frame(D,W2,W3)

I may be wrong, but the material that I found concerns only data management and analysis of one variable at a time (for example, employment status on several waves). My dataset is much larger than the one above, so I can’t actually impose this manually, as shown on page 48 of the tutorial. Has anyone dealt with this data type using TraMineR (or a similar package)?

1) How will you transfer the data above to TraMineR?

2) How would you calculate substitution costs and then group them?

Thank you very much

+4

r traminer

maycobra Jun 10 '13 at 8:47

source share

2 answers

In Biemann and Datta (2013), they talk about multivariate analysis. This means creating multiple sequences for the same "individuals."

To do this, I used the following approach:

1) determine three-dimensional sequences

 comp.seq <- seqdef(comp,NULL,states=comp.scodes,labels=comp.labels, alphabet=comp.alphabet,missing="Z") titles.seq <- seqdef(titles,NULL,states=titles.scodes,labels=titles.labels, alphabet=titles.alphabet,missing="Z") member.seq <- seqdef(member,NULL,states=member.scodes,labels=member.labels, alphabet=member.alphabet,missing="Z")

2) Calculate multichannel (multidimensional) distance

 mcdist <- seqdistmc(channels=list(comp.seq,member.seq,titles.seq),method="OM",sm=list("TRATE","TRATE","TRATE"),with.missing=TRUE)

3) group it using the arrival method:

 library(cluster) clusterward<- agnes(mcdist,diss=TRUE,method="ward") plot(clusterward,which.plots=2)

Do not pay attention to such parameters as "missing" or "left", etc., but I hope that the sample short code will help.

+1

Pedro braz Mar 10 '15 at 13:56

source share

Matthias studer · Accepted Answer · 2013-06-10T09:55:07+0000

When using sequence analysis, we are interested in the evolution of one variable (for example, the sequence of one variable into several waves). You have several options for analyzing several variables:

Create sequences for each variable, and then analyze the relationships between clusters of sequences. In my opinion, this is the best way to go if your variables measure different concepts (like family and employment).
Create a new variable for each wave, which is interaction for different variables of the same wave using the interaction function. For example, for a wave unit, use L$IntVar1 <- interaction(L$A1, L$B1, L$C1, drop=T) (use drop=T to remove an unused combination of answers). Then analyze the sequence of this newly created variable. In my opinion, this is the preferred way if your variables are different dimensions of the same concept. For example, marriage, children and union - all this is connected with family life.
Create one sequence object for each variable, and then use seqdistmc to calculate the distance (multi-channel sequence analysis). This is equivalent to the previous method, depending on how you will set the cost of substitution (see below).

If you are using the second strategy, you can use the following substitution costs. You can calculate the difference between the original variable to set the replacement cost. For example, between the states “Married, Child” and “Not married and child”, you can set the replacement to “1”, because there is only a difference in the variable “marriage”. In the same way, you would set the cost of recharging between the states "Married, Child" and "Single" and "No Child" to "2", because all your variables are different. Finally, you set the indel value to half the maximum replacement cost. This is the strategy used by seqdistmc .

Hope this helps.

Several events in the training

More articles: