I have the following data frame:
library(tidyverse) dat <- structure(list(fasta_header = c(">seq1", ">seq2"), sequence = c("MPSRGTRPE", "VSSKYTFWNF")), .Names = c("fasta_header", "sequence"), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame")) dat #> # A tibble: 2 x 2 #> fasta_header sequence #> <chr> <chr> #> 1 >seq1 MPSRGTRPE #> 2 >seq2 VSSKYTFWNF
What I want to do is calculate the amino acid frequency for each row. The desired result is (manually)
fasta_header sequence MPSRGTEVKYFWN >seq1 MPSRGTRPE 1 1 1 2 1 1 1 0 0 0 0 0 0 >seq2 VSSKYTFWNF 0 0 2 0 0 1 0 1 1 1 2 1 1
How can I do this using the dplyr piping method?
source share