R: possible file truncation> = 4 GB

I have a zip file with 370 MB, and the contents is a 4.2 GB csv file.

I did:

unzip("year2015.zip", exdir = "csv_folder")

And I got this message:

1: In unzip("year2015.zip", exdir = "csv_folder") :
  possible truncation of >= 4GB file

Have you experienced this before? How did you solve it?

+5
source share
2 answers

I agree with @ Sixiang.Hu's answer, R unzip () will not work reliably with files larger than 4 GB.

To get how you decided it? : I tried several different tricks with it, and in my experience the result of something using the built-in R-cards is (almost) the consistently incorrect identification of the end-of-file marker (EOF) before the actual end of the file.

, , , , UNIX unzip. , (unzip()), .

decompress_file <- function(directory, file, .file_cache = FALSE) {

    if (.file_cache == TRUE) {
       print("decompression skipped")
    } else {

      # Set working directory for decompression
      # simplifies unzip directory location behavior
      wd <- getwd()
      setwd(directory)

      # Run decompression
      decompression <-
        system2("unzip",
                args = c("-o", # include override flag
                         file),
                stdout = TRUE)

      # uncomment to delete archive once decompressed
      # file.remove(file) 

      # Reset working directory
      setwd(wd); rm(wd)

      # Test for success criteria
      # change the search depending on 
      # your implementation
      if (grepl("Warning message", tail(decompression, 1))) {
        print(decompression)
      }
    }
}    

:

, :

  • system2 , : "system2 - , "
  • directory file directory; , unzip ( )
    • , - , .
    • , , CLI-
  • , -o ,
  • .file_cache,
    • , , , 4 +
  • , , , inline
  • system2 stdout ,
    • a if + grepl check stdout stdout,
+6

?unzip, Note:

bzip2 > 2GB zip ( >= 4 , zip : , R ).

R (, 7-Zip).

+5

Source: https://habr.com/ru/post/1672070/


All Articles