How to make a slot filled with several objects of the same type in R?

Let's say I want to define two class classes, Sentence and Word . Each word object has a string of characters and a part of speech (pos). Each sentence contains a number of words and has an additional data slot.

The Word class does not matter for the definition.

 wordSlots <- list(word = "character", pos = "character") wordProto <- list(word = "", pos = "") setClass("Word", slots = wordSlots, prototype = wordProto) Word <- function(word, pos) new("Word", word=word, pos=pos) 

Now I want to create a Sentence class that can contain some Word and some numeric data.

If I define the Sentence class as follows:

 sentenceSlots <- list(words = "Word", stats = "numeric") sentenceProto <- list(words = Word(), stats = 0) setClass("Sentence", slots = sentenceSlots, prototype = sentenceProto) 

Then the sentence can contain only one word. I could define it with many slots, one for each word, but then it will be limited in length.

However, if I define the Sentence class as follows:

 sentenceSlots <- list(words = "list", stats = "numeric") sentenceProto <- list(words = list(Word()), stats = 0) setClass("Sentence", slots = sentenceSlots, prototype = sentenceProto) 

it can contain as many words as I want, but the words slot can contain objects that are not Word class.

Is there any way to do this? It will be similar to C ++, where you can have a vector of objects of the same type.

+6
source share
2 answers

Remembering that R works well on vectors, the first step is to think about words, not about "Word"

 ## constructor, accessors, subset (also need [[, [<-, [[<- methods) .Words <- setClass("Words", representation(words="character", parts="character")) words <- function(x) x@words parts <- function(x) x@parts setMethod("length", "Words", function(x) length(words(x))) setMethod("[", c("Words", "ANY", "missing"), function(x, i, j, ...) { initialize(x, words=words(x)[i], parts=parts(x)[i], ...) }) ## validity setValidity("Words", function(object) { if (length(words(object)) == length(parts(object))) NULL else "'words()' and 'parts()' are not the same length" }) 

@Nicola’s suggestion that one of them has a list of words was formalized in the IRanges package (in fact, S4Vectors in the devel / 3.0 Bioconductor branch), where "SimpleList" uses a "naive" approach, requiring that all elements of the list have the same class , whereas "CompressedList" has similar behavior, but is actually implemented as a vector object (one with a length of (), [and [[methods] ", which is" divided "(either along the contour or in width) into groups.

 library(IRanges) .Sentences = setClass("Sentences", contains="CompressedList", prototype=c(elementType="Words")) 

Then you can write a more convenient constructor, but the main functionality

 ## 0 Sentences .Sentences() ## 1 sentence of 0 words .Sentences(unlistData=.Words(), partitioning=PartitioningByEnd(0)) ## 3 sentences of 2, 0, and 3 words s3 <- .Sentences(unlistData=.Words(words=letters[1:5], parts=LETTERS[1:5]), partitioning=PartitioningByEnd(c(2, 2, 5))) 

leading to

 > s3[[1]] An object of class "Words" Slot "word": [1] "a" "b" Slot "part": [1] "A" "B" > s3[[2]] An object of class "Words" Slot "word": character(0) Slot "part": character(0) > s3[[3]] An object of class "Words" Slot "word": [1] "c" "d" "e" Slot "part": [1] "C" "D" "E" 

Please note that some typical operations are fast because they can work with “unregistered” elements without creating or destroying S4 instances, for example, forcing all “words” to be uppercase

 setMethod(toupper, "Words", function(x) { x@word <- toupper( x@word ); x }) setMethod(toupper, "Sentences", function(x) relist(toupper(unlist(x)), x)) 

This is “quick” for large sets of sentences, because unlist / relist is really in the slot access and creating a single instance of “Words”. Scalable genomics with R and Bioconductor describes this and other strategies.

In response, @nicola says that “R is not ideal for the OO programming style”, but it is probably more useful to understand that the object oriented style of R S4 is different from C ++ and Java, just as R is different from C. In particular, it is really valuable for continuing thinking in terms of vectors when working with S4 - Words, not Word, People, not Person ...

+7
source

I offer only a workaround for this class of problems. Keep in mind that R is not ideal for the OO programming style, and each solution is unlikely to demonstrate the strength of other languages ​​such as Java or C ++. However, you can declare your Sentence class as a words slot as a list. Then you define your constructor as such:

  Sentence<-function(words,stats) { #check for the components' class of words argument if (!is.list(words) || !all(sapply(words,function(x) class(x)=="Word"))) stop("Not valid words argument") #create the object new("Sentence", words=words, stats=stats) } 

An example of such a constructor can be found in the sp package for the Polygons class. You can see the body of this function.

If you want the user to set the words slot incorrectly, you can override the @<- operator, for example:

  "@<-.Sentence"<-function(sentence,...) invisible(sentence) 

I do not think the last step is necessary. No matter what you do, the user can always damage things. For example, it could directly call the new function, bypassing your constructor. Or he can set the Word class to an arbitrary object, and then pass it to Sentence . As I said, R is not ideal for this programming style, so you often have to make some kind of suboptimal decision.

+4
source

Source: https://habr.com/ru/post/1202610/


All Articles