The difference between the data [, "col"] and the data $ col

From the other answers on this site for similar questions and, for example, from pages like http://www.r-tutor.com/r-introduction/data-frame/data-frame-column-vector , it seems that I I extract the variable from data.frame , data[ , "col"] and data$col to give the same result. But now I have some data in Excel:

 LU Urban_LU LU_Index Urban_LU_index Residential Residential 2 0 Rural residential Residential 3 0 Commercial Commercial 4 1 Public institutions including education Industrial 5 1 Industry Industrial 7 2 

)

and I read it with read_excel from the readxl package:

 library(readxl) data <- read_excel("data.xlsx", "Sheet 1") 

Now I am extracting one variable from the data frame using [ or $ :

 data[ , "LU"] # Source: local data frame [5 x 1] # # LU # (chr) # 1 Residential # 2 Rural residential # 3 Commercial # 4 Public institutions including education # 5 Industry data$LU # [1] "Residential" "Rural residential" # [3] "Commercial" "Public institutions including education" # [5] "Industry" length(data[ , "LU"]) # [1] 1 length(data$LU) # [1] 5 

In addition, data classes obtained from read_excel and data obtained from two different extraction methods are suspicious:

 class(data) # [1] "tbl_df" "tbl" "data.frame" class(data[ , "LU"]) # [1] "tbl_df" "data.frame" class(data$LU) # [1] "character" > 

So what is the difference between [ , "col"] and $col ? Am I missing something from the manual or is this a special case? Also, what about the class identifiers tbl_df and tbl ? I suspect they are the cause of my confusion, what do they mean?

+5
source share
1 answer

More extended comment:

The fact that readxl::read_xl returns the output of the tbl_df class is poorly documented in ?read_xl . This behavior was mentioned in the readxl ad on the RStudio blog , though:

"[ read_xl r] produces output with class c("tbl_df", "tbl", "data.frame") "

To learn more about tbl_df , we need to refer to the dplyr help dplyr . In the Methods ?dplyr::tbl_df Methods section, we find that " tbl_df implements two important basic methods: [ Never simplifies (crashes), therefore always returns data.frame ".

For more information, read the drop argument in ?[.data.frame .

Related Q & A: Extract the dplyr tbl column as a vector and Best practice for getting a dropped column in dplyr tbl_df .

See also the "original" issue on github and discussion in it.

+1
source

Source: https://habr.com/ru/post/1235752/


All Articles