How does direct match cache work?

Question

How does direct match cache work?

I take a course in systems architecture, and it’s hard for me to understand how direct mapping cache works.

I looked in several places, and they explain it differently, which bothers me even more.

What I can’t understand is the tag and index, and how are they selected?

Explanation from my lecture: "The address is divided into two parts by an index (for example, 15 bits) used to directly access RAM (32k). The rest of the address, the tag is stored and compared with the incoming tag."

Where does this tag come from? It cannot be the full memory location in RAM, since it makes useless cache memory useless (compared to a fully associative cache).

Thank you very much.

+33

caching cpu-architecture system

Percentage Apr 10 '13 at 21:44

source share

4 answers

SexyBeast · Answer 1 · 2015-12-01 18:33

Good. So let's first understand how the processor interacts with the cache.

There are three levels of memory (in a broad sense) - cache (usually from SRAM chips), main memory (usually from DRAM chips) and storage (usually magnetic, disks). Whenever the CPU needs any data from a specific place, it first looks for a cache to find out if it is. Cache memory is closest to the processor in terms of memory hierarchy, therefore its access time is the least (and the cost is the highest), therefore, if the data processor that searches for data can be found there, it is a “hit”, and the data is obtained from there to use the CPU. If it is not, then the data must be transferred from the main memory to the cache before the CPU is available to it (the processor usually interacts only with the cache), which imposes a time penalty.

So, to find out if there is data in the cache or not, various algorithms are used. One of them is a direct cache matching method. For simplicity, suppose a memory system that has 10 available caches (numbered 0 through 9) and 40 main memory locations (numbered 0 through 39). This image sums up:

40 main memory locations are available, but the cache can only accommodate up to 10. So, now, somehow, the incoming request from the CPU needs to be redirected to the cache location. This has two problems:

How to redirect? In particular, how can this be done in a predictable way that will not change over time?
If the cache is already full with some data, the incoming request from the CPU must determine if the address from which it requires data is the same as the address whose data is stored in this location.

In our simple example, we can redirect using simple logic. Considering that we need to display 40 main memory locations, sequentially numbered from 0 to 39 to 10 cache locations with numbers from 0 to 9, the cache location for memory location n may be n%10 . So 21 corresponds to 1, 37 corresponds to 7, etc. It becomes an index .

But 37, 17, 7 all correspond to 7. So, to distinguish between them, a tag appears . So, like index n%10 , the tag is int(n/10) . So now 37, 17, 7 will have the same index 7, but different tags like 3, 1, 0, etc. That is, a mapping can be fully defined by two data tags and an index.

So now, if the request is sent for address 29, which will translate to tag 2 and index 9. The index corresponds to the cache location number, so there is no space for the cache. 9 it will be checked if it contains any data, and if so, if the associated tag is 2. If so, then it goes to the CPU and the data will be retrieved from this place immediately. If it is empty or the tag is not equal to 2, this means that it contains data corresponding to some other memory address, and not 29 (although it will have the same index, which means that it contains data from an address of type 9, 19, 39 etc.). So this is a processor miss, but there is no data from the location. 29 in the main memory will need to be loaded into the cache at location 29 (and the tag changed to 2 and deleted all the data that was there before), after which it will be extracted by the CPU.

Danny · Answer 2 · 2013-04-11 16:30

Let's use an example. A 64 kilobyte cache with 16 byte cache lines has 4096 different cache lines.

You need to break the address into three different parts.

The least significant bits are used to indicate a byte in the cache line, when you return it, this part is not directly used in the cache search. (bit 0-3 in this example)
The following bits are used to index the cache. If you think the cache is a large column of cache lines, the index bits tell you which line you need to look for for your data. (bit 4-15 in this example)
All other bits are TAG bits. These bits are stored in the tag repository for the data that you have stored in the cache, and we compare the corresponding bits of the cache request with what we saved to find out if the data we cache is the requested data.

The number of bits you use for the index is log_base_2 (number_of_cache_lines) [this is actually the number of sets, but the direct matching cache has the same number of lines and sets]

deepsubmicron · Answer 3 · 2013-04-12 22:00

The direct mapped cache is similar to a table with rows, also called a cache line, and at least 2 columns, one for data and one for tags.

Here's how it works: read access to the cache takes up the middle part of the address, which is called the index, and uses it as the line number. Data and tag are viewed at the same time. Then the tag needs to be compared with the top of the address to determine if a string from the same address range in memory matches and is valid. At the same time, the bottom of the address can be used to select the requested data from the cache line (I assume that the cache line may contain data for several words).

I emphasized a bit that access to data and access to tags + comparison occurs simultaneously, because this is the key to reducing latency (the purpose of the cache). Access to the marking of the data transmission path should not be two steps.

The advantage is that reading is basically a simple table search and comparison.

But it is directly displayed, which means that for each read address in the cache there is exactly one place where this data can be cached. Thus, the disadvantage is that many other addresses will be mapped to the same location and can compete for this cache line.

Percentage · Answer 4 · 2013-04-23 02:15

I found a good book in the library that offered me the clear explanation I needed, and now I’ll talk about it here if any other student stumbles on this topic when searching in caches.

In Computer Architecture - A Quantitative Approach, 3rd edition of Hennessey and Patterson, p. 390.

First, keep in mind that the main memory is divided into blocks for the cache. If we have a cache of 64 bytes and 1 GB of RAM, the RAM will be divided into 128 KB blocks (1 GB RAM / 64V cache = 128 KB Block size).

From book:

Where can I place a block in the cache?

If in each block there is only one place that can appear in the cache, the cache is considered a direct mapping. The target block is calculated using the following formula: <RAM Block Address> MOD <Number of Blocks in the Cache>

So, suppose we have 32 RAM blocks and 8 cache blocks.

If we want to save block 12 from RAM to cache, block 12 of RAM will be stored in block 4 of the cache. Why? Since 12/8 = 1 remainder 4. The remainder is the target block.

If a block can be placed anywhere in the cache, the cache is considered completely associative.
If the block can be placed anywhere in a limited set of places in the cache, the cache is installed associatively.

Basically, a set is a group of blocks in a cache. The block is first displayed on the set, and then the block can be placed anywhere inside.

Formula: <RAM Block Address> MOD <Number of Sets in the Cache>

So, suppose we have 32 RAM blocks and a cache divided into 4 sets (each set has two blocks, which means only 8 blocks). Thus, the value 0 will have blocks 0 and 1, for set 1 there will be blocks 2 and 3, etc.

If we want to store the 12 RAM block in the cache, the RAM block will be stored in cache blocks 0 or 1. Why? Since 12/4 = 3 of the remainder 0. Therefore, 0 is chosen, and the block can be placed anywhere inside the set 0 (which means block 0 and 1).

Now I will return to the original address problem.

How is a block detected if it is in the cache?

Each block block in the cache has an address. Just to make it clear, a block has both an address and data.

The block address is divided into several parts: tag, index, and offset.

The tag is used to find the block inside the cache, the index shows only the set in which the block is located (which makes it quite redundant), and the offset is used to select data.

By "select the data" I mean that in the cache block there will obviously be more than one memory location, the offset is used to select between them.

So, if you want to present a table, these will be the columns:

 TAG | INDEX | OFFSET | DATA 1 | DATA 2 | ... | DATA N

The tag will be used to find the block, the index will show in which the block is installed, the offset will select one of the fields on the right.

I hope that my understanding of this is correct, if it is not, please let me know.

How does direct match cache work?

More articles: