Can git treat zip files as directories and files inside zip as blobs?

Scenario

Imagine that I have to work with some of my files, which are always stored in .zip files. Some of the files inside zip are small text files and often change, while others are larger, but, fortunately, are pretty static (like images).

If I want to place these zip files inside the git repository, each zip code is treated like blob, so whenever I commit the repository, the zip file size increases ... even if only one small text file inside is changed!

Why is it realistic

MS Word 2007/2010 .docx and Excel .xlsx files are ZIP files ...

What I want

Is there any way to tell git not to treat zips as files, but rather as directories and treat their contents as files?

Benefits

  • significantly smaller repo size, i.e. faster transfer / backup
  • Show changes using Git for zip will automatically work

But it cannot work, you say?

I understand that without additional metadata this will lead to some ambiguity: on git checkout Git will need to decide whether to create foo.zip/bar.txt as a file in a regular directory or in a zip file. However, this could be solved using the configuration options, I would have thought.

Two ideas on how this can be done (if it does not already exist)

  • using a library like minizip or IO::Compress::Zip inside git
  • somehow adds the file system level, so Git actually sees zip files as directories to start with
+45
git msysgit zip
Nov 03 2018-11-11T00:
source share
6 answers

It does not exist, but it can easily exist in the current structure. Just as git behaves differently with the display of binary or ascii files when running diff, we can say that it offers special processing for certain types of files through the configuration interface.

If you don't want to change the code base (although this is a pretty cool idea that you got), you can also script for yourself using pre-commit and post-checkout hooks to unzip and save the files and then return them to their .zip -condition during verification. You will have to restrict actions to only those blobs / indexes files that are specified by git add .

In any case, this is a little work - it's just a question of whether another git comments on what is happening and playing well.

+15
Nov 03 2018-11-11T00:
source share

Use bup (detailed in GitMinutes # 24 )

This is the only git-like system designed to work with large (even very large) files, which means that each version of the zip file will only increase the repo from its delta (instead of a full additional copy)

The result is an actual git repo that a regular git command can read.

I will explain in detail how bup differs from git in git in large files .




Any other workaround (for example git-annex ) is not entirely satisfactory, as described in the section " git-annex with large files .

+9
Nov 21 '13 at 18:57
source share

Not sure if anyone is still interested in this issue. I ran into the same problems and here is my solution that uses git file filter.

Edit: Firstly, I cannot say that this is understandable, but this is the answer to the OP question! Read the entire offer before commenting. Also, thanks to @Toon Krijthe for advice to clarify the solution in place.

My solution is to use a filter for a β€œflat” zip file into a monolithic extended (maybe huge) text file. During git, add / commit the zip file will be automatically expanded to this text format for normal text change, and during the check it will be automatically zipped.

A text file consists of entries, each of which represents a file in zip. So you can that this text file is a text image for the original zip. If the file in zip is actually text, it is copied to a text file; otherwise, it is encoded by base64 before copying to a text format file. It always saves the text file as a text file.

Although this filter does not make every file in a zip blob, the text file is mapped to a string that is a diff unit, while binary changes can be represented by updates to the corresponding base64, I think this is equivalent to what the OP imagines.

For more information and prototype code, you can read the following link:

Zippey git file filter

In addition, a loan to the place that inspired me to this decision: Description of the file filter

+9
Apr 18 '14 at 9:19
source share

http://tante.cc/2010/06/23/managing-zip-based-file-formats-in-git/

(Note: for a comment from Ruben , it is only about getting the right parsing, although not about opening the unpacked files.)

Open the ~ / .gitconfig file (create it if it does not already exist) and add the following stanza:

[diff "zip"] textconv = unzip -c -a

What he does is use "unzip -c -a FILENAME" to convert your zip file to ASCII text (unzip -c will unpack to STDOUT). Next - create / modify the REPOSITORY / .gitattributes file and add the following

*. pptx diff = zip

which tells git to use the zip-diffing description from config for the files defining this mask (in this case, everything that ends with .pptx). Now git diff automatically decompresses the files and distinguishes between ASCII output, which is slightly better than just β€œbinary files”. On the other hand, to the confusing mess that corresponds to the pptx XML files, this does not help much, but for ZIP files, including text (for example, source code archives), this is actually quite convenient.

+5
Nov 28 '13 at 11:08
source share

I think you will need to install the zip file in the file system. I have not used it, but will consider FUSE:

http://code.google.com/p/fuse-zip/

There is also ZFS for Windows and Linux:

http://users.telenet.be/tfautre/softdev/zfs/

+2
Nov 03 '11 at 20:50
source share

Often there are problems with pre-compressed files for applications, as they expect the zip compression method and file order to be what they chose. I believe that open .odf files for the office have this problem.

However, if you just use any-old-zip as a method to store data together, you should create some simple aliases that will be unzipped and re-zipped if necessary. The very latest Msysgit (aka Git for Windows) now has both ZIP and unzip on the shell code side, so you can use them in aliases.

The project I'm working on now uses zips as the main local version / archive management, so I'm also trying to get a workable set of aliases to suck these hundreds of zip codes in Git (and taking them out again ;-) so that the staff are happy.

+2
Nov 03 2018-11-11T00:
source share



All Articles