Read the csv file in R containing the brackets at the beginning and end of each line

I have a text file that looks like this

(abc,123) (def,456) (ghi,789) ... 

In R, I would like to read this file as csv. Therefore, I need to get rid of the opening and closing brackets at the end of the lines. Do you have an idea how to achieve this?

Reading a file, removing parentheses, and writing to a temporary file should be avoided if possible.

+6
source share
6 answers

I would choose the readLines route since you need to manipulate the file first. Then you can use the text argument in read.csv/table

 > writeLines(c("(abc,123)", "(def,456)", "(ghi,789)"), "yourfile.txt") ## put your data in a file > txt <- gsub("[()]", "", readLines("yourfile.txt")) > read.csv(text = txt, header = FALSE) # V1 V2 # 1 abc 123 # 2 def 456 # 3 ghi 789 

or

 > read.table(text = txt, sep = ",") # V1 V2 # 1 abc 123 # 2 def 456 # 3 ghi 789 
+3
source

Ok, this seems to work (on my Mac):

 read.table(pipe("tr -d '()' < ~/Desktop/paren.txt"),header = FALSE,sep = ",") V1 V2 1 123 abc 2 456 def 3 789 ghi 
+6
source

Crazy ideas time, but you can create your own colClasses definitions and use them in read.table , for example:

 setClass("strippedL") setClass("strippedR") setAs("character", "strippedL", function(from) as.character( gsub("(", "", from, fixed=TRUE))) setAs("character", "strippedR", function(from) as.numeric( gsub(")", "", from, fixed=TRUE))) 

Here is how it will be useful. Replace the text argument with the file argument to access the file.

 read.table(text = "(abc,123) (def,456) (ghi,789)", sep = ",", header = FALSE, colClasses = c("strippedL", "strippedR")) # V1 V2 # 1 abc 123 # 2 def 456 # 3 ghi 789 

Less crazy (but slower) idea: try read.pattern from the "gsubfn" development version:

 library(gsubfn) source("http://gsubfn.googlecode.com/svn/trunk/R/read.pattern.R") pat <- "^\\((.*),(.*)\\)$" read.pattern("~/path/to/file.txt", pattern=pat, header = FALSE) 
+4
source

Honestly, the best way to deal with this situation is to edit the source file before reading it in R I cannot imagine any reason to avoid this, to justify writing some fancy R code to remove parentheses after reading in the data.

Open your choice of text editor and tell him (the editor) to remove all parentheses. Save the file (if necessary, a new file), then open the new file with read.csv .

But if you need to

 foo<- read.csv(your_file) gsub('(','',foo) gsub(')','',foo) foo[,2]<-as.numeric(foo[,2]) 

EDIT: passed the speed test:

 paren1<-function(file) { foo<- read.csv(file) gsub('[()]','',foo) #gsub(')','',foo) foo[,2]<-as.numeric(foo[,2]) } setClass("strippedL") setClass("strippedR") setAs("character", "strippedL", function(from) as.character( gsub("(", "", from, fixed=TRUE))) setAs("character", "strippedR", function(from) as.numeric( gsub(")", "", from, fixed=TRUE))) paren2<-function(file) { foo<- read.table(file,sep = ",", header = FALSE, colClasses = c("strippedL", "strippedR")) return(invisible(foo)) } library(microbenchmark) # my "paren.txt" has 860 lines in it microbenchmark(paren1('paren.txt'),paren2('paren.txt')) Unit: milliseconds expr min lq median uq max neval paren1("paren.txt") 3.341024 3.461614 3.486416 3.514639 4.060715 100 paren2("paren.txt") 2.164631 2.251439 2.285007 2.322211 5.681836 100 

So, Ananda’s decision is noticeably faster. Okay:-)

+1
source

You can try:

  str1 <- c("(abc,123)","(def,456)","(ghi,789)") library(qdap) read.table(text=unlist(bracketXtract(str1, "round")),sep=",") # V1 V2 #1 abc 123 #2 def 456 #3 ghi 789 
+1
source

Here's an option using the gsub function for the first and second columns of data.frame:

 tmp <- read.table("tmp.csv", sep=",", stringsAsFactors=FALSE) #tmp <- structure(list(V1 = c("(abc", "(def", "(ghi"), V2 = c("123)", "456)", "789)")), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, -3L)) # to reproduce tmp tmp tmp[,1] <- gsub("(", "", tmp[,1], fixed = TRUE) tmp[,2] <- gsub(")", "", tmp[,2], fixed = TRUE) tmp 
0
source

Source: https://habr.com/ru/post/971869/


All Articles