Is there a way to "r" fine-tune two significant characters from a longer row from a column in the data table.
I have a data table in which there is a column with rows of “degrees” ... an abbreviated code for the degree that someone received and the year they finished.
> srcDT<- data.table(
alum=c("Paul Lennon","Stevadora Nicks","Fred Murcury"),
degree=c("W72","WG95","W88")
)
> srcDT
alum degree
1: Paul Lennon W72
2: Stevadora Nicks WG95
3: Fred Murcury W88
I need to extract year digits from a degree and put it in a new column called "degree_year"
No problems:
> srcDT[,degree_year:=substr(degree,nchar(degree)-1,nchar(degree))]
> srcDT
alum degree degree_year
1: Paul Lennon W72 72
2: Stevadora Nicks WG95 95
3: Fred Murcury W88 88
If it were always that simple. The problem is that degree strings sometimes look like higher. Most often they look like this:
srcDT<- data.table(
alum=c("Ringo Harrison","Brian Wilson","Mike Jackson"),
degree=c("W72 C73","WG95 L95","W88 WG90")
)
I am only interested in 2 numbers next to the characters that excite me: W and WG (and if there is W and WG, I only care about WG)
Here's how I solved it:
x <-srcDT$degree
z <-character()
degree.grep.pattern <-c("WG[0-9][0-9]","W[0-9][0-9]")
for(i in 1:length(x)){
matched=F
for(j in 1:length(degree.grep.pattern)){
if(length(grep(degree.grep.pattern[j],x[i]))>0){
m <- regexpr(degree.grep.pattern[j],x[i])
y<-regmatches(x[i],m)
matched=T
break
}
}
if(matched){
yr <- substr(y,nchar(y)-1,nchar(y))
}else{
yr <- substr(x[i],nchar(as.character(x[i]))-1,nchar(as.character(x[i])))
}
z<-c(z,yr)
}
srcDT$degree_year<-z
> srcDT
alum degree degree_year
1: Ringo Harrison W72 C73 72
2: Brian Wilson WG95 L95 95
3: Mike Jackson W88 WG90 90
. 100% . , .
, . 10k 100k , .
, ? "C". "R."
?
. . 30 , - 540 .
, .grep.pattern . , , 7 8 .