What is the fastest way to unzip text files in matlab during function?

I would like to scan the text of text files in Matlab using the textscan function. Before I can open a text file with fid = fopen ('C: \ path'), I need to unzip the files first. Files have the extension: * .gz

There are thousands of files that I need for analysis and high performance.

I have two ideas: (1) Use an external program that calls it from the command line in Matlab (2) Use zip'toolbox Matlab. I heard about gunzip but don't know about its performance.

Does anyone know a way to unzip these files as quickly as possible from within Matlab?

Thanks!

+4
source share
3 answers

You can always try the matlab unzip () function:

unpack

Extract zip file contents

Syntax

unzip (zipfilename) unzip (zipfilename, outputdir) unzip (url, ...) filenames = unzip (...)

Description

unzip (zipfilename) extracts the archived contents of zipfilename to the current folder and sets file attributes, keeping time stamps. It overwrites any existing files with the same names as in the archive, if the attributes and owners of the existing files allow it. For example, files from rerunning unzip in the same zip file name do not overwrite any of those files that have a read-only attribute; instead, unzip the messages for such files.

Internally, it uses the Java zip library org.apache.tools.zip . If your zip archives contain a lot of text files, perhaps they will be dumped faster in Java and retrieve their record by record without explicitly unpacked files. take a look at unzip.m source for some ideas as well as Java documentation.

+2
source

I found 7zip-commandline (Windows) / p7zip (Unix) to be somewhat faster for this.

[edit] From some quick tests, it seems that the gunzip system call is faster than using the native gunzip MATLAB. You can also try to try.

Just write a new function that mimics the main functions of Mzlab gunzip:

function [] = sunzip (fullfilename, output_dir)
if ~ exist ('output_dir', 'var'), output_dir = fileparts (fullfilename); end

app_path = '/ usr / bin / 7za';
switch = 'e'; % extraction files ignoring directory structure
options = ['-o' output_dir];

system ([app_path switches the options '_' fullfilename]);

Then use it as you would use gunzip:

sunzip ('/data/time_1000.out.gz', tmp_dir);

With the MATLAB toc timer, I get the following extraction times with 6 uncompressed ASCII 114 MB files:

gunzip: 10.15s
sunzip: 7.84s

+2
source

worked fine, it just took a minor change to the Max syntax calling the executable.

 system([app_path switches ' ' fullfilename options ]); 
+1
source

Source: https://habr.com/ru/post/1301804/


All Articles