Add a column of the listed keywords (rows) based on a column of text

If I have a data framework with the following column:

df$text <- c("This string is not that long", "This string is a bit longer but still not that long", "This one just helps with the example")

and the lines are:

keywords <- c("not that long", "This string", "example", "helps")

I'm trying to add a column to my data framework with a list of keywords that exist in the text for each row:

Df $ keywords:

1 c("This string","not that long")    
2 c("This string","not that long")    
3 c("helps","example")

Although I'm not sure how to 1) extract matching words from a text column, and 2) how to then list their corresponding words on each row in a new column

+4
source share
2 answers

We can extract using str_extractfromstringr

library(stringr)
df$keywords <- str_extract_all(df$text, paste(keywords, collapse = "|"))
df
#                                                text                   keywords
#1                        This string is not that long This string, not that long
#2 This string is a bit longer but still not that long This string, not that long
#3                This one just helps with the example             helps, example

Or in a chain

library(dplyr)
df %>%
   mutate(keywords = str_extract_all(text, paste(keywords, collapse = "|")))
+2
source

Maybe so:

df = data.frame(text=c("This string is not that long", "This string is a bit longer but still not that long", "This one just helps with the example"))
keywords <- c("not that long", "This string", "example", "helps")

df$keywords = lapply(df$text, function(x) {keywords[sapply(keywords,grepl,x)]})

Conclusion:

                                                 text                   keywords
1                        This string is not that long not that long, This string
2 This string is a bit longer but still not that long not that long, This string
3                This one just helps with the example             example, helps

lapply df$text, lapply keywords, df$text. , , , , :

df$keywords = lapply(df$text, function(x) {keywords[sapply(keywords, function(y){grepl(y,x)})]})

, !

+3

Source: https://habr.com/ru/post/1692877/


All Articles