Consider the following hypothetical data:
x <- "There is a horror movie running in the iNox theater. : If row names are supplied of length one and the data
frame has a single row, the row.names is taken to specify the row names and not a column (by name or number).
If row names are supplied of length one and the data frame has a single row, the row.names is taken to specify
the row names and not a column (by name or number) Can we go : Please"
y <- "There is a horror movie running in the iNox theater. If row names are supplied of length one and the data
frame has a single row, the row.names is taken. To specify the row names and not a column. By name or number. :
If row names are supplied of length one and the data frame has a single row, the row.names is taken to specify
the row names and not a column (by name or number) Can we go : Please"
z <- "There is a horror movie running in the iNox theater. If row names are supplied of length one and the data frame has a single row, the row.names is taken to specify the row names and not a column (by name or number).
If row names are supplied of length one. : And the data frame has a single row, the row.names is taken to specify
the row names and not a column (by name or number) Can we go : Please"
df <- data.frame(Text = c(x, y, z), row.names = NULL, stringsAsFactors = F)
You noticed that there is a ":" in different places. For instance:
- In 'x' it (":") is after the first sentence.
- In 'y' it is (":") after the fourth sentence.
- and In 'z' after the sixth sentence.
- In addition, before the last sentence in each text there is one more thing: "
What I want to do is create two columns that:
- Only the first ":" and NOT THE LAST is considered.
- If the first three sentences have ":", divide all the text into two columns, otherwise save the text in the second column and "NA" in the first column.
Required output for 'x':
Col1 Col2
There is a horror movie running in the iNox theater. If row names are supplied of length one and the data frame has a single row, the row.names is taken to specify the row names and not a column (by name or number). If row names are supplied of length one and the data frame has a single row, the row.names is taken to specify the row names and not a column (by name or number) Can we go : Please
"y" ( ":" , ):
Col1 Col2
NA There is a horror movie running in the iNox theater. If row names are supplied of length one and the data frame has a single row, the row.names is taken. To specify the row names and not a column. By name or number. : If row names are supplied of length one and the data frame has a single row, the row.names is taken to specify the row names and not a column (by name or number) Can we go : Please
"y" , Wanted Output "z" :
Col1 Col2
NA all of the text from 'z'
:
resX <- data.frame(Col1 = gsub("\\s\\:.*$","\\1", df$Text[[1]]),
Col2 = gsub("^[^:]+(?:).\\s","\\1", df$Text[[1]]))
resY <- data.frame(Col1 = gsub("\\s\\:.*$","\\1", df$Text[[2]]),
Col2 = gsub("^[^:]+(?:).\\s","\\1", df$Text[[2]]))
resZ <- data.frame(Col1 = gsub("\\s\\:.*$","\\1", df$Text[[3]]),
Col2 = gsub("^[^:]+(?:).\\s","\\1", df$Text[[3]]))
"resDF" rbind.
: