I have data.frame annot
, which is defined as:
annot <- structure(list(Name = c("dd_1", "dd_2", "dd_3","dd_4", "dd_5", "dd_6","dd_7"), GOs =
c("C:extracellular space; C:cell body; P:cell migration process; P:NF/ß pathway",
"C:Signal transduction; C:nucleus; F:positive regulation; P:single organism; P:positive(+) regulation",
"C:cardiomyceltes; C:intracellular pace; F:putative; F:magnesium ion binding; F:calcium ion binding; P:visual perception; P:blood coagulation",
"F:poly(A) RNA binding; P:DNA-templated transcription, initiation",
"C:ULK1-ATG13-FIP200 complex; F:histone-arginine N-methyltransferase activity; P:single-organism cellular process",
"F:3'-5' DNA helicase activity; P:acetate-CoA ligase activity",
"F:UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate-D-alanyl-D-alanine ligase activity; P:oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor"
)), .Names = c("Name", "GOs"), class = "data.frame", row.names = c(NA,
-7L))
The data.frame format is as follows:
Name GOs
dd_1 C:extracellular space; C:cell body; P:cell migration process; P:NF/ß pathway
dd_2 C:Signal transduction; C:nucleus; F:positive regulation; P:single organism; P:positive(+) regulation
dd_3 C:cardiomyceltes; C:intracellular pace; F:putative; F:magnesium ion binding; F:calcium ion binding; P:visual perception; P:blood coagulation
dd_4 F:poly(A) RNA binding; P:DNA-templated transcription, initiation
dd_5 C:ULK1-ATG13-FIP200 complex; F:histone-arginine N-methyltransferase activity; P:single-organism cellular process
dd_6 F:3'-5' DNA helicase activity; P:acetate-CoA ligase activity
dd_7 F:UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate-D-alanyl-D-alanine ligase activity; P:oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor
Each entry contains words, special characters, alphanumeric characters in C, F, P. I would like to separate all the values corresponding C:xxx;F:yyy:P:zzz
to separate columns, with their corresponding values, such as:
Name Component Function P
dd_1 C:extracellular space;C:cell body F:transport carrier P:cell migration process;P:NF/ß pathway
dd_2 C:Signal transduction;C:nucleus F:positive regulation P:single organism;P:positive regulation
dd_3 C:cardiomyceltes;C:intracellular pace F:magnesium ion P:visual perception;P:blood coagulationbinding;F:calcium ion binding;
dd_4 F:poly(A) RNA binding; P:DNA-templated transcription, initiation
dd_5 C:ULK1-ATG13-FIP200 complex F:histone-arginine N-methyltransferase activity P:single-organism cellular process
dd_6 F:3'-5' DNA helicase activity; P:acetate-CoA ligase activity
dd_7 F:UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate-D-alanyl-D-alanine ligase activity P:oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor
I tried the following command in R using tidyr
separate(annot, GOs, into = c("P", "F", "C"), sep = "[a-z]+=")
but he returned the following error:
Error: Values not split into 3 pieces at 1, 2, 3,4