Rsync --sparse transfers all data

Question

Rsync --sparse transfers all data

I have several VM images that need to be synced every day. VM files are limited.

To save network traffic, I only want to transfer real image data. I am trying to use the --sparse option in rsync, but in network traffic I see that the entire size is transmitted over the network, and not just to the actual use of the data.

If I use rsync -zv --sparse, then only the real size is transmitted over the network, and everything is fine. But I do not want to compress the file due to processor usage.

Should the -sparse option transfer only real data, and "null data" is created locally to save network traffic?

Is there any way around without compression?

Thanks!

+6

sparse-matrix rsync

user2933212 Nov 06 '13 at 19:35

source share

3 answers

You can try to change the compression level to the minimum value (use the option --compress-level=1 ). The lowest compression level seems to be enough to reduce traffic for sparse files. But I do not know how CPU usage is affected.

+1

Bastian Nov 05 '15 at 15:03

source share

The latest version of rsync can handle --sparse and --inplace together! I found the following github entry from 2016: https://github.com/tuna/rsync/commit/f3873b3d88b61167b106e7b9227a20147f8f6197

0

syntron May 18 '19 at 17:20

source share

Rafa · Accepted Answer · 2013-11-07T03:34:27+0000

See this discussion , in particular this answer .

The solution seems to be to run rsync --sparse followed by rsync --inplace .

First, --sparse , a call, also use --ignore-existing to prevent overwriting already migrated sparse files, and -z to save network resources.

The second call --inplace should ~~update only the changed fragments~~ . Here, compression is optional.

Also see this post .

Update

I believe that the above suggestions will not help solve your problem. I also think rsync not suitable for this task. You should look for other tools that will provide you with a good balance between network and disk I / O performance.

rsync was designed to leverage a single resource, the network. It is assumed that reading and writing to the network is much more expensive than reading and writing source and target files.

We assume that both machines are connected using a two-band communication line with high bandwidth and low bandwidth. Rsync algorithm, abstract .

Algorithm summarized in four steps.

The receiving side β sends checksums of blocks of size S of the target file B.
The sending side α identifies the blocks that correspond to the source file A , at any offset.
α sends β a list of instructions made from verbatim, non-compliant, data or corresponding block references.
β restores the entire file from these instructions.

Note that rsync usually restores file B as a temporary file T , then replaces B with T. In this case, it should write the entire file.

--inplace does not free rsync from writing blocks mapped to α , as one might imagine. They can coincide with different offsets. Scanning B a second time to receive new data checksums is excessive in terms of performance. A block that matches the same offset that was read in the first step may be skipped, but rsync does not. In the case of a sparse file, block zero B will correspond for each block zero A and must be overwritten.

--inplace just makes rsync write directly to B instead of T. It will overwrite the entire file.

Rsync --sparse transfers all data

More articles: