Fread - multiple delimiters per line

I am trying to read a table using fread. There is text in the txt file that looks like this:

"No","Comment","Type"
"0","he said:"wonderful|"","A"
"1","Pr/ "d/s". "a", n) ","B"

R-codes that I use: dataset0 <- fread("data/test.txt", stringsAsFactors = F)with the version for the development of the data package .table R.

Expect to see a dataset with three columns; However:

Error in fread(input = "data/stackoverflow.txt", stringsAsFactors = FALSE) : 
Line 3 starting <<"1","Pr/ ">> has more than the expected 3 fields.
Separator 3 occurs at position 26 which is character 6 of the last field: << n) ","B">>. 
Consider setting 'comment.char=' if there is a trailing comment to be ignored.

How to solve it?

+4
source share
2 answers

The development version of data.table handles files in which embedded quotes have not been escaped. See point 10 on the wiki page .

I just tested it at your input and it works.

$ more unescaped.txt
"No","Comment","Type"
"0","he said:"wonderful."","A"
"1","The problem is: reading table, and also "a problem, yes." keep going on.","A"

> DT = fread("unescaped.txt")
> DT
   No                                                                  Comment Type
1:  0                                                     he said:"wonderful."    A
2:  1 The problem is: reading table, and also "a problem, yes." keep going on.    A
> ncol(DT)
[1] 3
+6
source

readLines , read.table:

# read with no sep
x <- readLines("test.txt")

# introduce new sep - "|"
x <- gsub("\",\"", "\"|\"", x)

# read with new sep
read.table(text = x, sep = "|", header = TRUE)

#   No                                                                  Comment Type
# 1  0                                                     he said:"wonderful."    A
# 2  1 The problem is: reading table, and also "a problem, yes." keep going on.    A
+2

Source: https://habr.com/ru/post/1672875/


All Articles