Efficient way to define a class with multiple, optionally empty slots in S4 from R?

Question

Efficient way to define a class with multiple, optionally empty slots in S4 from R?

I am creating a package for processing data that comes with 4 different types. Each of these types is a legal class in the form of a matrix, data.frame or tree. Depending on the data processing method and other experimental factors, some of these data components may be missing, but it is still extremely useful to be able to store this information as an instance of a special class and have methods that recognize different data components.

Approach 1:

I experimented with an incremental inheritance structure that looks like a nested tree, where each combination of data types has its own class, explicitly defined. In the future, it is difficult to distribute for additional data types, and it is also difficult for new developers to learn all class names, however well-organized names can be.

Approach 2:

The second approach is to create a single "master class", which includes a slot for all 4 data types. To allow slots to be NULL for instances of missing data, you must first define the virtual class union between the NULL class and the new data type type, and then use the virtual class union as the expected class for the corresponding slot in the master class. Here is an example (assuming each type of data type is already defined):

 ################################################################################ # Use setClassUnion to define the unholy NULL-data union as a virtual class. ################################################################################ setClassUnion("dataClass1OrNULL", c("dataClass1", "NULL")) setClassUnion("dataClass2OrNULL", c("dataClass2", "NULL")) setClassUnion("dataClass3OrNULL", c("dataClass3", "NULL")) setClassUnion("dataClass4OrNULL", c("dataClass4", "NULL")) ################################################################################ # Now define the master class with all 4 slots, and # also the possibility of empty (NULL) slots and an explicity prototype for # slots to be set to NULL if they are not provided at instantiation. ################################################################################ setClass(Class="theMasterClass", representation=representation( slot1="dataClass1OrNULL", slot2="dataClass2OrNULL", slot3="dataClass3OrNULL", slot4="dataClass4OrNULL"), prototype=prototype(slot1=NULL, slot2=NULL, slot3=NULL, slot4=NULL) ) ################################################################################

Thus, the question can be rephrased as:

Are there more effective and / or flexible alternatives to any of these approaches?

This example is modified from an answer to a SO question about setting a default value for a slot to NULL . This question is different in that I am interested in knowing the best options in R for creating classes with slots that can be empty if necessary, despite the fact that they require a certain complex class in all other non-empty cases.

+4

oop r s4

Paul mcmurdie Nov 30 '11 at 23:56

source share

1 answer

Martin morgan · Accepted Answer · 2011-12-01T01:40:42+0000

To my mind...

Approach 2

This is a kind of defeat in order to adopt a formal class system, and then to create a class that contains fuzzy slots ("A" or "NULL"). At a minimum, I would try to force DataClass1 to have a "NULL" default. As a simple example, a numbered numerical vector is used here by default.

 setClass("DataClass1", representation=representation(x="numeric")) DataClass1 <- function(x=numeric(), ...) { new("DataClass1", x=x, ...) }

Then

 setClass("MasterClass1", representation=representation(dataClass1="DataClass1")) MasterClass1 <- function(dataClass1=DataClass1(), ...) { new("MasterClass1", dataClass1=dataClass1, ...) }

One of the advantages of this is that the methods do not have to check if the instance in the slot is NULL or DataClass1

 setMethod(length, "DataClass1", function(x) length( x@x )) setMethod(length, "MasterClass1", function(x) length( x@dataClass1 )) > length(MasterClass1()) [1] 0 > length(MasterClass1(DataClass1(1:5))) [1] 5

In response to your comment about warning users when they access “empty” slots, and remembering that users usually want functions to do something and not tell them that they are doing something wrong, I would, probably returned an empty DataClass1() object that accurately reflects the state of the object. Maybe the show method will provide an overview that would improve the status of the slot - DataClass1: none. This seems particularly appropriate if MasterClass1 is a way of coordinating several different analyzes, of which the user can only make a few.

The limitation of this approach (or your approach 2) is that you do not receive method submission - you cannot write methods that are suitable only for an instance with DataClass1 instances that are of non-zero length and are forced to perform some kind of manual submission (for example , with if or switch ). This may seem limited to the developer, but it also applies to the user - the user does not understand what operations are unique to MasterClass1 instances that have nonzero lengths of DataClass1 instances.

Approach 1

When you say that class names in the hierarchy will confuse your user, it looks like this may indicate a more fundamental problem - you are trying too hard to make the data types comprehensive; the user will never be able to track ClassWithMatrixDataFrameAndTree because it does not reflect the way data is viewed. Perhaps this is an opportunity to reduce your ambition in order to really get involved in only the most famous parts of the field you are studying. Or, perhaps, the opportunity to rethink how the user can think and interact with the data they collected, and also use the separation of the interface (what the user sees) from the implementation (as you decide to present the data in classes) provided by class systems to more effectively encapsulate what the user can do.

Distracting the naming and the number of classes to the side, when you say "it is difficult to extend for additional data types in the future", it makes me wonder if some of the nuances of S4 classes can be confusing to you? The short solution is to not write your own initialize methods and rely on constructors to do the complex work in accordance with

 setClass("A", representation(x="numeric")) setClass("B", representation(y="numeric"), contains="A") A <- function(x = numeric(), ...) new("A", x=x, ...) B <- function(a = A(), y = numeric(), ...) new("B", a, y=y, ...)

and then

 > B(A(1:5), 10) An object of class "B" Slot "y": [1] 10 Slot "x": [1] 1 2 3 4 5

Efficient way to define a class with multiple, optionally empty slots in S4 from R?

Approach 2

Approach 1

More articles: