C: theory on how to extract files from an archive file

Question

C: theory on how to extract files from an archive file

In C, I created a program that can archive several files into an archive file through the command line. eg.

$echo 'file1/2' > file1/2.txt $./archive file1.txt file2.txt arhivedfile $cat archivedfile file1 file2

How to create a process so that in my archived file:

 header file1 end header file2 end

All of them are stored in the archive one after another after another. I know that perhaps a header file is needed (containing the file name, file name size, beginning and end of the file) to extract these files to the original form, but how would I do it.

I'm stuck on where and how to start.

Please help me with some logic regarding how to access the extraction of files from an archive file.

+4

c

donok Dec 22 '10 at 15:13

source share

4 answers

Dave jarvis · Answer 1 · 2010-12-22T15:49:56+0000

As mentioned earlier, start with the algorithm. You already have most of the details.

There are several approaches you can take:

Random Access Archive.
Sequential Access Archive.

Random Access Archive

For this to work, the header must act as an index (for example, map indices in the library), indicating; (a) where to find the beginning of each file; and (b) the length of each file. The algorithm for writing the archive file may look like this:

Get a list of all files from the command line.
Create a structure to store metadata about each file: name (255 char), size (64-bit int), date and time, and permissions.
For each file, get its statistics.
Save statistics of each file in an array of structures.
Open the archive for recording.
Write a headline structure.
For each file, add its contents to the archive file.
Close the archive file.

(Perhaps the title should also contain the number of files.)

Next, the algorithm for extracting files:

Get the archive file from the command line.
Get the file name to extract, also from the command line.
Create memory for the structure to read metadata about each file.
Read all metadata from the archive file.
Search for the file name to retrieve in the metadata list.
Calculate the offset in the archive file to start the name of the corresponding file.
Look for an offset.
Read the contents of the file and write it to a new file.
Close the new file.
Close the archive.

Sequential access

It is easier. You can do it yourself: think about the steps.

About programming

It’s easy to understand how something should work. I suggest that you take a step back - something that your teacher should discuss in the classroom - and try to reflect on the problem at a level higher than coding, because:

the algorithm you create is independent of the language;
correction of errors in the algorithm, before writing the code, is trivial:
you will have a better understanding of what you need to do before coding;
it takes less time to implement the solution;
you can define areas that can be implemented in parallel;
You will see potential obstacles ahead of time; and
You will be on your way to managerial positions as soon as possible .; -)

Bob jarvis · Answer 2 · 2010-12-22T15:27:34+0000

I would think that the header would need the information needed to identify the file and how large it is in the archive, for example, the file name, source directory and size in lines or bytes, whichever is more useful in your context. Then you need routines to create a header, add a file to the archive (create a header and add file data), extract the file from the archive (follow the headers until the correct entry is found and copy the data from the archive to a separate file) and delete file (start reading the archive, copy the data for all records except the ones you want to delete to a new file, and then delete the old archive and rename the new one to the old name).

Share and enjoy.

Anon · Answer 3 · 2010-12-22T15:27:34+0000

One approach is to emulate the ZIP format: http://en.wikipedia.org/wiki/ZIP_file_format

It uses a directory structure at the end of the file that contains pointers to file offsets in the archive. The big advantage of this structure is that you can find this file without having to read the entire archive - if you know the beginning of the directory and have the possibility of random access to the file.

An alternative is the TAR file format: http://en.wikipedia.org/wiki/Tar_file_format

This is for streaming media (the "tape archive"), so each entry contains its own metadata. You need to scan the entire file for writing, but the usual use case is to pack / unpack whole directory trees, so this is not so bad.

Glenn mcallister · Answer 4 · 2010-12-22T15:40:55+0000

Doing this in streaming mode, such as tar, is probably the easiest implementation. First write the magic number so that you can determine that this is your archive format. Then I suggested using stat (2) (this syntax for the man page man, section 2) to get the size of the archive. Actually, look carefully at the statistics fields available to you, maybe there is some interesting information that you want to keep.

Write the necessary information in the method = mode value, one in each line. For instance:

 FileName=file1.txt FileSize=10 FileDir=./blah/blah FilePerms=0700

End the title with two new lines so you know when to start pushing FileSize bytes to disk. You do not need to start the header marker, because you know that the file size is being written out, so you know when to start parsing the header again.

I suggest you use a text format for your header information, because then you do not need to worry about ordering bytes, etc., which you will need to worry about if you write a raw binary structure to disk.

When reading your archive, analyze the title lines one by one and fill out the local structure to store this information. Then write the file to disk and set any file properties that need updating, based on the header information that you extracted.

Hope this helps. Good luck.

C: theory on how to extract files from an archive file

Random Access Archive

Sequential access

About programming

More articles: