Problems with read.table (), newlines create unwanted empty fields

I am just starting with R and trying to understand some of the built-in functions. I am trying to organize a main FASTA text file that looks like this:

>ID1
AGAATAGCCAGAACCGTTTCTCTGAGGCTTCC
>ID2
TCCAATTAAGTCCCTATCCAGGCGCTCCG
>ID3
GAACCGGAGAACGCTTCAGACCAGCCCGGAC

In a table that will look something like this:

ID   Sequence
ID1  AGAATAGCCAGAACCGTTTCTCTGAGGCTTCC
ID2  TCCAATTAAGTCCCTATCCAGGCGCTCCG
ID3  GAACCGGAGAACGCTTCAGACCAGCCCGGAC

Or at least something organized in a similar way. Unfortunately, whenever I try to use read.table, I am forced to install fill = TRUEin order to avoid the following error:

> read.table("ReadingText.txt", header=F, fill=F, sep=">")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 2 did not have 2 elements

fill = TRUE , . , , R , , " > " .

, ? read.table , , - ? , , ! R.

, , , - , . , - .

+4
1

read.table() readLines() -. seqinr read.fasta(), . .

library(seqinr)
(fasta <- read.fasta("so.fasta", set.attributes = FALSE, as.string = TRUE, forceDNAtolower = FALSE))
# $ID1
# [1] "AGAATAGCCAGAACCGTTTCTCTGAGGCTTCC"
#
# $ID2
# [1] "TCCAATTAAGTCCCTATCCAGGCGCTCCG"
#
# $ID3
# [1] "GAACCGGAGAACGCTTCAGACCAGCCCGGAC"

setNames(rev(stack(fasta)), c("ID", "Sequence"))
#    ID                         Sequence
# 1 ID1 AGAATAGCCAGAACCGTTTCTCTGAGGCTTCC
# 2 ID2    TCCAATTAAGTCCCTATCCAGGCGCTCCG
# 3 ID3  GAACCGGAGAACGCTTCAGACCAGCCCGGAC

so.fasta​​p >

writeLines(">ID1
AGAATAGCCAGAACCGTTTCTCTGAGGCTTCC
>ID2
TCCAATTAAGTCCCTATCCAGGCGCTCCG
>ID3
GAACCGGAGAACGCTTCAGACCAGCCCGGAC", "so.fasta")

. Pascal . , . , , , - , , .

: , readLines(), . , .

x <- readLines("so.fasta")
ids <- grepl("^>", x)
data.frame(ID = sub(">", "", x[ids]), Sequence = x[!ids])
#    ID                         Sequence
# 1 ID1 AGAATAGCCAGAACCGTTTCTCTGAGGCTTCC
# 2 ID2    TCCAATTAAGTCCCTATCCAGGCGCTCCG
# 3 ID3  GAACCGGAGAACGCTTCAGACCAGCCCGGAC
+5

Source: https://habr.com/ru/post/1622856/


All Articles