Remembering that R works well on vectors, the first step is to think about words, not about "Word"
## constructor, accessors, subset (also need [[, [<-, [[<- methods) .Words <- setClass("Words", representation(words="character", parts="character")) words <- function(x) x@words parts <- function(x) x@parts setMethod("length", "Words", function(x) length(words(x))) setMethod("[", c("Words", "ANY", "missing"), function(x, i, j, ...) { initialize(x, words=words(x)[i], parts=parts(x)[i], ...) }) ## validity setValidity("Words", function(object) { if (length(words(object)) == length(parts(object))) NULL else "'words()' and 'parts()' are not the same length" })
@Nicola’s suggestion that one of them has a list of words was formalized in the IRanges package (in fact, S4Vectors in the devel / 3.0 Bioconductor branch), where "SimpleList" uses a "naive" approach, requiring that all elements of the list have the same class , whereas "CompressedList" has similar behavior, but is actually implemented as a vector object (one with a length of (), [and [[methods] ", which is" divided "(either along the contour or in width) into groups.
library(IRanges) .Sentences = setClass("Sentences", contains="CompressedList", prototype=c(elementType="Words"))
Then you can write a more convenient constructor, but the main functionality
## 0 Sentences .Sentences() ## 1 sentence of 0 words .Sentences(unlistData=.Words(), partitioning=PartitioningByEnd(0)) ## 3 sentences of 2, 0, and 3 words s3 <- .Sentences(unlistData=.Words(words=letters[1:5], parts=LETTERS[1:5]), partitioning=PartitioningByEnd(c(2, 2, 5)))
leading to
> s3[[1]] An object of class "Words" Slot "word": [1] "a" "b" Slot "part": [1] "A" "B" > s3[[2]] An object of class "Words" Slot "word": character(0) Slot "part": character(0) > s3[[3]] An object of class "Words" Slot "word": [1] "c" "d" "e" Slot "part": [1] "C" "D" "E"
Please note that some typical operations are fast because they can work with “unregistered” elements without creating or destroying S4 instances, for example, forcing all “words” to be uppercase
setMethod(toupper, "Words", function(x) { x@word <- toupper( x@word ); x }) setMethod(toupper, "Sentences", function(x) relist(toupper(unlist(x)), x))
This is “quick” for large sets of sentences, because unlist / relist is really in the slot access and creating a single instance of “Words”. Scalable genomics with R and Bioconductor describes this and other strategies.
In response, @nicola says that “R is not ideal for the OO programming style”, but it is probably more useful to understand that the object oriented style of R S4 is different from C ++ and Java, just as R is different from C. In particular, it is really valuable for continuing thinking in terms of vectors when working with S4 - Words, not Word, People, not Person ...