I struggle with how to best structure categorical data that are messy, and comes from a dataset. I will need to clear it.
Coding scheme
I am analyzing data from a university course exam. We look at patterns in student responses, and we have developed a coding scheme to represent the kinds of things students do in their answers. The following is a subset of the coding scheme.
<a href = "http://picasaweb.google.com/lh/photo/0tut3kR-JFoB0cP_0uFBZg?feat=embedwebsite" rel = "nofollow noreferrer"> 
Note that inside each main code (1, 2, 3) there are nested non-specific subcodes (a, b, ...).
What raw data looks like
I created an anonymous, raw subset of my evidence that you can view here here . Part of my problem is that those who encoded the data noticed that some students had multiple patterns. Encoders decision was to create a sufficient number of columns ( reason1, reason2, ...) to keep students with multiple templates. This becomes important because the order ( reason1, reason2) is arbitrary - two students (for example, student 41 and student 42 in my
dataset ) who correctly applied the “dependency” should be recorded in the analysis regardless of whether it appears 3ain a column reasonor column reason2.
How can I best structure student data?
, raw data
.
, . ,
:
<
href= "http://picasaweb.google.com/lh/photo/sQgGKgseA07Z_lKxRe4fkQ?feat=embedwebsite" rel= "nofollow noreferrer" > 
, student002 student003 "1b", , .
()
reason1, reason2, ... ?- ()
reason R, ?
, , R, , . , , , , stackoverflow . , , , .