Convert long string to data.frame

This is a question about noob, but I'm going crazy about it. I have a character vector called bars.list that I download from FTP Sever. The vector is as follows:

"\"\",\"times\",\"open\",\"high\",\"low\",\"close\",\"numEvents\",\"volume\"\r\n\"1\",2015-05-18 06:50:00,23.98,23.98,23.5,23.77,421,0\r\n\"2\",2015-05-18 07:50:00,23.77,23.9,23.34,23.6,720,0\r\n\"3\",2015-05-18 08:50:00,23.6,23.6,23.32,23.42,720,0\r\n\"4\",2015-05-18 09:50:00,23.44,23.91,23.43,23.66,720,0\r\n\"5\",2015-05-18 10:50:00,23.67,24.06,23.59,24.02,720,0\r\n\"6\",2015-05-18 11:50:00,24.02,24.04,23.32,23.33,720,0\r\n\"7\",2015-05-18 12:50:00,23.33,23.42,22.74,22.81,720,0\r\n\"8\",2015-05-18 13:50:00,22.79,22.92,22.49,22.69,720,0\r\n\"9\",2015-05-18 14:50:00,22.69,22.7,22.14,22.14,481,0\r\n\"10\",2015-05-19 06:50:00,21.09,21.49,20.82,21.47,421,0\r\n\"11\",2015-05-19 07:50:00,21.48,21.68,21.46,21.51,720,0\r\n\"12\",2015-05-19 08:50:00,21.51,21.93,21.45,21.92,720,0\r\n\"13\",2015-05-19 09:50:00,21.92,21.92,21.55,21.55,720,0\r\n\" 

I need to convert this vector to a convenient format, but

 > read.table(bars.list, header = TRUE, sep = ",", quote = "", dec = ".") Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file '"","times","open","high","low","close","numEvents","volume" "1",2015-05-18 06:50:00,23.98,23.98,23.5,23.77,421,0 "2",2015-05-18 07:50:00,23.77,23.9,23.34,23.6,720,0 "3",2015-05-18 08:50:00,23.6,23.6,23.32,23.42,720,0 "4",2015-05-18 09:50:00,23.44,23.91,23.43,23.66,720,0 

It is not clear to me why R is telling me that some Connection cannot be opened, since the object is already inserted as an argument to the function. Exit R Shows me with a warning sign already pretty close to what I need ...

+5
source share
1 answer

Here are two options. The first one offers a fix for your current code, and the second one is a simpler and more efficient option.

Option 1: The first argument to read.table() is file . You are reading from a vector not from a file, so you need to use the text argument, with text = bars.list .

In addition, we can first get rid of all quotes with gsub() , and then use read.csv() instead of read.table() , since header = TRUE and sep = "," are the default values.

 read.csv(text = gsub("\"", "", bars.list), row.names = 1) # times open high low close numEvents volume # 1 2015-05-18 06:50:00 23.98 23.98 23.50 23.77 421 0 # 2 2015-05-18 07:50:00 23.77 23.90 23.34 23.60 720 0 # 3 2015-05-18 08:50:00 23.60 23.60 23.32 23.42 720 0 # 4 2015-05-18 09:50:00 23.44 23.91 23.43 23.66 720 0 # 5 2015-05-18 10:50:00 23.67 24.06 23.59 24.02 720 0 # 6 2015-05-18 11:50:00 24.02 24.04 23.32 23.33 720 0 # 7 2015-05-18 12:50:00 23.33 23.42 22.74 22.81 720 0 # 8 2015-05-18 13:50:00 22.79 22.92 22.49 22.69 720 0 # 9 2015-05-18 14:50:00 22.69 22.70 22.14 22.14 481 0 # 10 2015-05-19 06:50:00 21.09 21.49 20.82 21.47 421 0 # 11 2015-05-19 07:50:00 21.48 21.68 21.46 21.51 720 0 # 12 2015-05-19 08:50:00 21.51 21.93 21.45 21.92 720 0 # 13 2015-05-19 09:50:00 21.92 21.92 21.55 21.55 720 0 

For me, this worked better than using the quote argument in read.csv() .

Option 2: fread() from the data.table package works well too. It is faster and the code is cleaner. No need to use gsub() with it. We can directly put bars.list and discard the first column.

 data.table::fread(bars.list, drop = 1) 

Now you will get a warning using this method because of the last quote. \" You can live with it or get the result without warning by removing this last quotation mark.

 data.table::fread(sub("\"$", "", bars.list), drop = 1) 

Data:

 bars.list <- "\"\",\"times\",\"open\",\"high\",\"low\",\"close\",\"numEvents\",\"volume\"\r\n\"1\",2015-05-18 06:50:00,23.98,23.98,23.5,23.77,421,0\r\n\"2\",2015-05-18 07:50:00,23.77,23.9,23.34,23.6,720,0\r\n\"3\",2015-05-18 08:50:00,23.6,23.6,23.32,23.42,720,0\r\n\"4\",2015-05-18 09:50:00,23.44,23.91,23.43,23.66,720,0\r\n\"5\",2015-05-18 10:50:00,23.67,24.06,23.59,24.02,720,0\r\n\"6\",2015-05-18 11:50:00,24.02,24.04,23.32,23.33,720,0\r\n\"7\",2015-05-18 12:50:00,23.33,23.42,22.74,22.81,720,0\r\n\"8\",2015-05-18 13:50:00,22.79,22.92,22.49,22.69,720,0\r\n\"9\",2015-05-18 14:50:00,22.69,22.7,22.14,22.14,481,0\r\n\"10\",2015-05-19 06:50:00,21.09,21.49,20.82,21.47,421,0\r\n\"11\",2015-05-19 07:50:00,21.48,21.68,21.46,21.51,720,0\r\n\"12\",2015-05-19 08:50:00,21.51,21.93,21.45,21.92,720,0\r\n\"13\",2015-05-19 09:50:00,21.92,21.92,21.55,21.55,720,0\r\n\"" 
+4
source

Source: https://habr.com/ru/post/1237062/


All Articles