How to solve fread txt problem with EOF?

I am trying to read climate station information from ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt . However, since the first row is not completely filled (the last two columns are missing), and the fifth column contains spaces, I cannot finish reading with:

fread('ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt',sep=) 

It returns an error message:

  Expected sep (' ') but new line, EOF (or other non printing character) ends field 5 when detecting types from point 0: AGE00135039 35.7297 0.6500 50.0 ORAN-HOPITAL MILITAIRE 

How to apply fread when reading this txt file? Thanks!

+5
source share
1 answer

Why don't you just try the read.fwf function from the utils package? The column widths are specified in the readme.txt file (see Section IV).

 IV. FORMAT OF "ghcnd-stations.txt" ------------------------------ Variable Columns Type ------------------------------ ID 1-11 Character LATITUDE 13-20 Real LONGITUDE 22-30 Real ELEVATION 32-37 Real STATE 39-40 Character NAME 42-71 Character GSN FLAG 73-75 Character HCN/CRN FLAG 77-79 Character WMO ID 81-85 Character ------------------------------ 

However, the following attempt returns an error:

 data <- read.fwf("ghcnd-stations.txt", widths = c(11,9,10,7,3,31,4,4,6)) Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 25383 did not have 7 elements 

Checking line 25383 shows the cause of the error.

 > x <- readLines("ghcnd-stations.txt", 25383) > tail(x, 1) [1] "CA002100627 60.8167 -137.7333 846.0 YT HAINES APPS #4 " 

So, comment.char around this by including the comment.char argument, changing the default (#) to something else, maybe just null.

 data <- read.fwf("ghcnd-stations.txt", widths = c(11,9,10,7,3,31,4,4,6), comment.char="") 

It only takes 20 seconds. There is no real need for fread .

0
source

Source: https://habr.com/ru/post/1271698/


All Articles