The documentation shows that frame-based streaming has better throughput than slice-based. He also notes that the latter does not scale well due to the fact that parts of the encoder are serial.
Acceleration against coding streams for a veryfast profile (not real-time):
threads speedup psnr slice frame slice frame x264 --preset veryfast --tune psnr --crf 30 1: 1.00x 1.00x +0.000 +0.000 2: 1.41x 2.29x -0.005 -0.002 3: 1.70x 3.65x -0.035 +0.000 4: 1.96x 3.97x -0.029 -0.001 5: 2.10x 3.98x -0.047 -0.002 6: 2.29x 3.97x -0.060 +0.001 7: 2.36x 3.98x -0.057 -0.001 8: 2.43x 3.98x -0.067 -0.001 9: 3.96x +0.000 10: 3.99x +0.000 11: 4.00x +0.001 12: 4.00x +0.001
The main difference is that streaming frames adds latency to frames, since this requires different frames, while in the case of streaming processing on a slice, all streams work with the same frame. In real time, you will need to wait until a frame appears to fill the pipeline, rather than offline.
Common threads, also known as frame-based threading, use a smart system with a raster frame for parallelism. But this is due to: as mentioned earlier, for each additional thread, another latent latent is required. Slicing based on a slice does not have such a problem: each frame is divided into slices, each fragment is encoded on one core, and then the result is deleted together to make the final frame. Its maximum efficiency is much lower for various reasons, but it allows at least some parallelism without increasing latency.
From: Developer Diary x264
Sliceless threading: an example with two threads. Start encoding frame # 0. When done halfway, start encoding frame # 1. Topic # 1 currently only has access to the upper half of its frame, since the rest have not been encoded yet. Therefore, it should limit the range of motion search. But this is probably normal (unless you use a lot of streams in a small frame), since there are rarely such long vertical motion vectors. After some time, both streams encoded one row of macroblocks, so stream # 1 still gets the opportunity to use the range of motion = +/- 1/2 of the frame height. Later, thread # 0 ends frame # 0 and goes to frame # 2. Thread # 0 now gets movement restrictions, and thread # 1 is unlimited.
From: http://web.archive.org/web/20150307123140/http://akuvian.org/src/x264/sliceless_threads.txt
Therefore, it makes sense to enable sliced-threads with -tune zereolatency , since you need to send a frame as soon as possible and then code them effectively (performance and quality).
Using too many threads on the contrary can affect performance, because the overhead to maintain them can exceed potential benefits.