usually, if you have a write buffer, it is flushed through the write buffer (the entire cache line). Then the write buffer at some point completes the write to ram. I have not heard of a cache that tracks every element in a line whose parts are dirty or not, so you have a cache line. So for the cases I heard about, the whole line goes out. Another point is that often for slow memory on the back of the DDR cache, for example, it accesses through some fixed width, 32 bits simultaneously 64 bits at a time, 128 bits at a time, or each part on this width has several parts. These kinds of things, therefore, to avoid reading-changing-writing, you want to write in full width. Cache lines of multiplicity of this, of course, and the ability to make notes does not exist. Also, if it has ecc on it, then you need to write a whole line of ecc right away to avoid read-modify writing.
You will need a dirty bit for each writeable item in the cache line to multiply the number of dirty bit storages by a certain amount, which may or may not have a real impact on size or cost, etc. There may or may not be overhead on the plunger side of the transaction, and it may be cheaper to complete a transaction with a few words, and not even two separate transactions, so this scheme can create performance rather than increase (the same problem inside the write buffer , and not one transaction with a starting address and length, now several transactions).
It just seems like a lot of work for something that may or may not lead to amplification. If you find one that really is, post it here.
source share