How can I structure and transcode messy categorical data into R?

Question

How can I structure and transcode messy categorical data into R?

I struggle with how to best structure categorical data that are messy, and comes from a dataset. I will need to clear it.

Coding scheme

I am analyzing data from a university course exam. We look at patterns in student responses, and we have developed a coding scheme to represent the kinds of things students do in their answers. The following is a subset of the coding scheme.

<a href = "http://picasaweb.google.com/lh/photo/0tut3kR-JFoB0cP_0uFBZg?feat=embedwebsite" rel = "nofollow noreferrer"> StackOverflowQuestion20100504.001.png

Note that inside each main code (1, 2, 3) there are nested non-specific subcodes (a, b, ...).

What raw data looks like

I created an anonymous, raw subset of my evidence that you can view here here . Part of my problem is that those who encoded the data noticed that some students had multiple patterns. Encoders decision was to create a sufficient number of columns ( reason1, reason2, ...) to keep students with multiple templates. This becomes important because the order ( reason1, reason2) is arbitrary - two students (for example, student 41 and student 42 in my dataset ) who correctly applied the “dependency” should be recorded in the analysis regardless of whether it appears 3ain a column reasonor column reason2.

How can I best structure student data?

, raw data . , . , :

< href= "http://picasaweb.google.com/lh/photo/sQgGKgseA07Z_lKxRe4fkQ?feat=embedwebsite" rel= "nofollow noreferrer" > StackOverflowQuestion20100504.002.png

, student002 student003 "1b", , .

()

reason1, reason2, ... ?
() reason R, ?

, , R, , . , , , , stackoverflow . , , , .

+3

r statistics plyr

briandk 04 '10 23:43

3

ddply plyr , , , split. .

x <- ddply(data, c("split_column1", "split_column3" etc),
           summarize(result_df, stats you want from result_df))

+2

Dan 05 '10 0:34

( ) ? ?

, " , , , "?

- , - , .

, !

+1

Stray 05 '10 14:32

Eduardo Leoni · Accepted Answer · 2010-05-05T07:04:30+0000

"":

library(reshape)
dnow <- read.csv("~/Downloads/catsample20100504.csv")
dnow <- melt(dnow, id.vars=c("Student", "instructor"))
dnow$variable <- NULL ## since ordering does not matter
subset(dnow, Student%in%c(41,42)) ## see the results

, . , .

How can I structure and transcode messy categorical data into R?

Coding scheme

What raw data looks like

How can I best structure student data?

()

More articles: