R: combining frequency lists of different lengths by labels?

I am new to R, but I really like it and want to constantly improve. Now, after working for some time, I have to ask you for help.

This is a given case:

1) I have sentences (sentence .1 and sentence .2 - all words are already lowercase) and create sorted lists of frequencies of their words:

sentence.1 <- "bob buys this car, although his old car is still fine." # saves the sentence into sentence.1 sentence.2 <- "a car can cost you very much per month." sentence.1.list <- strsplit(sentence.1, "\\W+", perl=T) #(I have these following commands thanks to Stefan Gries) we split the sentence at non-word characters sentence.2.list <- strsplit(sentence.2, "\\W+", perl=T) sentence.1.vector <- unlist(sentence.1.list) # then we create a vector of the list sentence.2.vector <- unlist(sentence.2.list) # vectorizes the list sentence.1.freq <- table(sentence.1.vector) # and finally create the frequency lists for sentence.2.freq <- table(sentence.2.vector) 

Here are the results:

 sentence.1.freq: although bob buys car fine his is old still this 1 1 1 2 1 1 1 1 1 1 sentence.2.freq: a can car cost month much per very you 1 1 1 1 1 1 1 1 1 

Now, please, how could I combine these two frequency lists so that I will have the following:

  a although bob buys can car cost fine his is month much old per still this very you NA 1 1 1 NA 2 NA 1 1 1 NA NA 1 NA 1 1 NA NA 1 NA NA NA 1 1 1 NA NA NA 1 1 NA 1 NA NA 1 1 

Thus, this "table" should be "flexible" so that in the case of entering a new sentence with a word, for example. "and", the table will add a column labeled "and" between "a" and "although".

I thought about simply adding new sentences to a new line and putting all the non words that are not yet listed in the column of the list (here, β€œand” will be on your right ”) and sorting the list again, However, I could not handle it. since the sorting of the frequencies of the new sentence words according to the existing inscriptions did not work (when, for example, there is, for example, β€œcar”, the frequency of the new car offer should be written in a new sentence line and the column β€œcar”, but when for the first time there is "you", its frequency should be A new column with the inscription "you" is written in a new sentence line).

+4
source share
1 answer

This is not exactly what you are describing, but what you are aiming for makes more sense to me, organized by a row rather than a column (and R processes the data ordered in this way a little easier).

 #Convert tables to data frames a1 <- as.data.frame(sentence.1.freq) a2 <- as.data.frame(sentence.2.freq) #There are other options here, see note below colnames(a1) <- colnames(a2) <- c('word','freq') #Then merge merge(a1,a2,by = "word",all = TRUE) word freq.x freq.y 1 although 1 NA 2 bob 1 NA 3 buys 1 NA 4 car 2 1 5 fine 1 NA 6 his 1 NA 7 is 1 NA 8 old 1 NA 9 still 1 NA 10 this 1 NA 11 a NA 1 12 can NA 1 13 cost NA 1 14 month NA 1 15 much NA 1 16 per NA 1 17 very NA 1 18 you NA 1 

Then you can use merge to add additional sentences. I just converted the column names, but there are other options. Using the arguments by.x and by.y instead of by in merge may indicate that certain columns merge if the names do not match in each data frame. In addition, the suffix argument in merge will determine how unique counter names are assigned to counter columns. By default, .x and .y , but you can change this.

+3
source

Source: https://habr.com/ru/post/1390600/


All Articles