Possible error in R all.equal

I came across some weird behavior in R. all.equal function. Basically, I create two identical data.frames in different ways, and then I call the all.equal function (also checking the data and attributes).

The code for reproducing the behavior is as follows:

var.a <- data.frame(cbind(as.integer(c(1,5,9)), as.integer(c(1,5,9)))) colnames(var.a) <- c("C1", "C2") rownames(var.a) <- c("1","5","9") var.b <- data.frame(matrix(NA, nrow = 10, ncol = 2)) var.b[, 1] <- 1:10 var.b[, 2] <- 1:10 colnames(var.b) <- c("C1", "C2") var.b <- var.b[seq(1, nrow(var.b), 4), ] all.equal(var.a, var.b) 

Is this a mistake or am I just missing something? I was pretty good at debugging the function all.equall, and it seems that the problem is with the names of the data.frames sockets (once they are a symbol, another time a number vector). All.equall function response:

[1] "Attributes: <Component 2: Modes: character, number>"
[2] "Attributes: <Component 2: target is symbol, current is numeric>"

but

typeof (rownames (var.a)) == typeof (rownames (var.b))

returns TRUE , which bothers me.

PS: The structure of the objects seems the same:

 > str(var.a) 'data.frame': 3 obs. of 2 variables: $ C1: int 1 5 9 $ C2: int 1 5 9 > str(var.b) 'data.frame': 3 obs. of 2 variables: $ C1: int 1 5 9 $ C2: int 1 5 9 

I would appreciate it if someone could shed light on this.

+4
source share
3 answers

(I don’t quite understand what error you think you found. Data frames were not created the same.) There are two differences in the var.a and var.b structures: the mode of the elements in the columns: numeric in 'var.a' and integer in 'var.b'; and growth name mode: integer for 'var.a' and character in 'var.b':

 > dput(var.b) structure(list(C1 = c(1L, 5L, 9L), C2 = c(1L, 5L, 9L)), .Names = c("C1", "C2"), row.names = c(1L, 5L, 9L), class = "data.frame") > dput(var.a) structure(list(C1 = c(1, 5, 9), C2 = c(1, 5, 9)), .Names = c("C1", "C2"), row.names = c("1", "5", "9"), class = "data.frame") > mode(attr(var.b, "row.names")) [1] "numeric" > storage.mode(attr(var.b, "row.names")) [1] "integer" > mode(attr(var.a, "row.names")) [1] "character" 

Note added: if you want to check numerical equality, you should use the 'check.attributes' switch:

 > all.equal(var.a, var.b, check.attributes=FALSE) [1] TRUE 

If you look at var.b on dput , you will see that outlet names are numeric:

 > dput(var.b) structure(list(C1 = c(1L, 5L, 9L), C2 = c(1L, 5L, 9L)), .Names = c("C1", "C2"), row.names = c(1L, 5L, 9L), class = "data.frame") 
+11
source

but

typeof (rownames (var.a)) == typeof (rownames (var.b))

returns TRUE, which bothers me.

In addition to the most voted answer, note that attributes are stored as "character" for var.a and as "numeric" for var.b :

 > attr(var.a, "row.names") [1] "1" "5" "9" > attr(var.b, "row.names") [1] 1 5 9 

While the rownames() function will set its output value to "character" :

 > rownames(var.a) [1] "1" "5" "9" > rownames(var.b) [1] "1" "5" "9" 

That is why you get TRUE in the above command. By ?rownames :

For a data frame, the value for outlet names must be a symbol vector of non-duplicated and non-skipped names (this is mandatory), and for colnames, a symbol vector of (preferably) unique syntactically valid names. In both cases, the value will depend on as.character, and setting colnames will convert the string names to a character.

More suitable check:

 > typeof(attr(var.a, "row.names")) == typeof(attr(var.b, "row.names")) [1] FALSE 

This suggests that all.equal() messages are cryptic at best ...

+1
source

One of them has a numeric number, and the other is an integer. You can see it with

 str(var.a); str(var.b) > str(var.a); str(var.b) 'data.frame': 3 obs. of 2 variables: $ C1: num 1 5 9 $ C2: num 1 5 9 'data.frame': 3 obs. of 2 variables: $ C1: int 1 5 9 $ C2: int 1 5 9 
0
source

Source: https://habr.com/ru/post/1434529/


All Articles