I have a data frame with ~ 300 observations, each of which is associated with a numerical code, which I want to break down into its components. The code variable is either a 3 or 4-digit integer aligned with its last digit, and so my desired result would look something like this:
code d4 d3 d2 d1
403 <NA> 4 0 3
5123 5 1 2 3
105 <NA> 1 0 5
While I can see many ways to split the code using strsplit
(base R) or stringr::str_split
, I can hardly apply any of these operations to a data frame.
library(stringr)
as.integer(unlist(str_split(5123, ""))[1])
as.integer(rev(unlist(str_split(5123, "")))[1])
But believable (to me) operation
libray(dplyr)
df <- data.frame(code = c(403, 5123, 105))
df <- df %>%
mutate(
last = as.integer(rev(unlist(str_split(df$code,"")))[4])
)
returns
> df
code last
1 403 3
2 5123 3
3 105 3
Clearly, my understanding of how lists and atomic vectors are handled within data frames is lacking ...
, separate()
, extract()
tidyr
. , tidyr::separate()
, :
library(tidyr)
dfsep <- data.frame(code = c(" 4 0 3", "5 1 2 3", " 1 0 5"))
dfsep <- dfsep %>%
separate(
code, c("d4", "d3", "d2", "d1"), fill = "right", remove = FALSE
)
dfsep
code d4 d3 d2 d1
1 4 0 3 4 0 3
2 5 1 2 3 5 1 2 3
3 1 0 5 1 0 5
; tidyr::separate()
df <- data.frame(code = c(403, 5123, 105))
df <- df %>%
separate(
code, c("d4", "d3", "d2", "d1"), fill = "right", remove = FALSE
)
df
code d4 d3 d2 d1
1 403 403 <NA> <NA> <NA>
2 5123 5123 <NA> <NA> <NA>
3 105 105 <NA> <NA> <NA>
tidyr::extract()
, , , , 3, 4 :
dfext <- data.frame(code = c(403, 5123, 105))
dfext <- dfext %>%
extract(
code, c("d4", "d3", "d2", "d1"), "(.)(.)(.)(.)", remove = FALSE
)
dfext
code d4 d3 d2 d1
1 403 <NA> <NA> <NA> <NA>
2 5123 5 1 2 3
3 105 <NA> <NA> <NA> <NA>
, , ...
StackOverflow, separate(), extract(), , . , .
, !
P.S. , . , : , , , (, 1 5), , , , , J1, J2,... J5 , ( ) ( , ). FINA