Creating a zoo object from a csv file (with several inconsistencies) with R

I am trying to create a Zoo object in R from the following csv file: http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/Skewdailyprices.csv

The problem is that in the period from 02/27/2006 to 3/20/2006 (some additional commas and "x") there are some minor inconsistencies that lead to problems.

I am looking for a method that automatically reads the complete CSV file in R. There is a new data point on every working day, and with manual pre-assignment you will have to manually edit the file every day manually.

I'm not sure that these are the only problems with this file, but I'm running out of ideas for creating a zoo object from this time series. I think that with some knowledge of R this should be possible.

+4
source share
2 answers

Use colClasses to say that there are 4 fields and use fill so that he knows to fill them if they are not on any line. Ignore warning:

 library(zoo) URL <- "http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/Skewdailyprices.csv" z <- read.zoo(URL, sep = ",", header = TRUE, format = "%m/%d/%Y", skip = 1, fill = TRUE, colClasses = rep(NA, 4)) 
+5
source

It is recommended to separate the cleaning and analysis steps. Since you mention that your dataset changes frequently, this cleanup should be automatic. Here is a solution for auto cleaning.

 #Read in the data without parsing it lines <- readLines("Skewdailyprices.csv") #The bad lines have more than two fields n_fields <- count.fields( "Skewdailyprices.csv", sep = ",", skip = 1 ) #View the dubious lines lines[n_fields != 2] #Fix them library(stringr) #can use gsub from base R if you prefer lines <- str_replace(lines, ",,x?$", "") #Write back out to file writeLines(lines[-1], "Skewdailyprices_cleaned.csv") #Read in the clean version sdp <- read.zoo( "Skewdailyprices_cleaned.csv", format = "%m/%d/%Y", header = TRUE, sep = "," ) 
+3
source

Source: https://habr.com/ru/post/1399832/


All Articles