The need of this came from the fact that read.csv
can read zip files directly but data.table::fread
cannot take connections as input since it requires random file seek. There is simple usage of data.table::fread(paste0("zcat < ", PATH_TO_FILE))
but that depend on command line tool gzip, which is not always available in windows. See here for more details.
The code is based on R.utils::decompressFile
with lots of modifications:
- no more removing input file. I lost several data files and puzzled too much time because of this even if I read the document and knew this behavior in the beginning.
- According to
?connections
, `gzfile`` can handle gz, bzip2, xz. No need to specify and use different functions for uncompress. - The only exception is
gzfile
cannot handle zip. We can useunzip
to decompress file directly without need of connections.unzip
does not support Unicode filenames as introduced in zip 3.0. See more in?unzip
for its limitations. If you really need Unicode filename, it might be easier to just install the command line tool gzip (if it is not available already, like windows) and use format likedata.table::fread(paste0("zcat < ", PATH_TO_FILE))
directly.
read.csv
can read regular file and zip file in same syntax. temp_unzip
actually can take regular file as input which is just write to temp directory again. Obviously this is not optimal, we will want to test if file is compressed. To use fread
in same syntax for regular file or zip file, we can have something like this:
fread_all <- function(object, ...) {
# just read directly to test if it is regular file
data <- try(fread(object, nrows = 5),silent = TRUE)
if (class(data) == "data.frame") {
return(fread(object, ...))
} else {
return(temp_unzip(object, fread, ...))
}
}
2017-03-06 Added warning on multiple files in zip. Mac OS will add hidden folder even for single file zip. Our function still support this case but also gave warning and information about the file will be extracted.