• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

Cache Puzzlement

trag

Well-known member
The appearance of two Performa 600s, which lack the IIvx's cache have sent me on an exploration of the art of the cache.

This is a pretty good reference regarding the basics:

https://courses.cs.washington.edu/courses/cse378/09wi/lectures/lec15.pdf

Basically, you have two sets of memory, which are smaller and faster than the next lower level of memory.   Those two sets consist of Cache and TAG RAM.

They have the same number of addressing bits.  In other words, if you have 32K words of cache storage, then you have 32K tags as well.

When a memory request is made, a portion of the address bits is used as an address to the cache storage and to the TAG storage.  This portion of the address is called the index.    The remainder of the memory address is the Tag or Tag Address.   The word stored in TAG memory is looked at and if it matches the remainder of the address (the Tag address), then the address the CPU tried to access is already stored in cache, and the cache contents are used.

Here's my puzzlement.     I've been examining some real world caches.    A IIci cache is pretty straight forward.   It's made of four 8K X 8 SRAM chips for cache storage, which gives you an 8K X 32 memory.   IIci has a 32 bit data bus, so good so far.  And it has a couple of 8K X 8 chips for TAG RAM.   The 8K portions match up.    The 13 bits (13 bits => 8K) of cache address plus 15 bits of TAG storage (2 X 8 - 1;  1 is used for "valid" bit) = 28 bits of cacheable address space.    Which is plenty for a machine with 128MB RAM maximum and considering it's one address per word, so 28 bits address space equals about 30 bits of byte space, depending on how they wired the addresses on the cache.

The IIci has 32 bit addressing.  And each word in memory is 4 bytes (2 bits to address) wide.    And only 128 MB of RAM is possible (ignoring the memory map for the moment).     So, in theory, one would need to deal with 128 MB => 27 bits, minus 2 bits because each word is 4 bytes wide =>  27 bits - 2 bits = 25 bits of total cache address.     With 8K of cache and TAG, that means that the Cache Index is 13 bits wide (8K => 13 bits).   So the IIci needs a minimum of 12 (25 - 13 = 12) bits of Tag address.  13 bits of Index address, plus 12 bits of Tag address makes up the 25 bit cache address.   Two 8 bit wide TAG RAMs minus a valid bit (2 X 8 -1), leaves 15 bits for the Tag address, so that's plenty and actually supports 1GB of RAM space, which I think matches the memory map.

But again, 32K of cache memory means 32K TAGs or TAG words.  Or in the case of the IIci 8K of Cache memory means 8K of TAG words.

Okay, that wasn't my puzzlement.   That was an example of how I'm not puzzled.   Here's my puzzlement.   I also examined a NuBus PowerMac cache.   It consists of eight 32K X 8 SRAM chips, which gives the cache 32K X 64 memory.  That's good.  PowerPC uses a 64 bit data bus.  But the Cache TAG is only 8K X 16.   13 bits of addressing, plus 15 (16 - 1) bits of tag is 28 bits, which again, is enough to cover all of the address space, wordwise.  

But 8K TAG addresses does not equal 32K of cache space.    WTF?

Are the NuBus PowerPCs using a 4 word block for every cache location?   That might make sense.  They could break up the address so that the lowest two address bits for the word are block index bits.   Then the next lowest order 13 bits of address would be the cache index address sent to the TAG RAM.    Then the 14 bits remaining would be the Tag Address compared to the TAG RAM contents to determine a hit or not.    If there is a hit, then the lowest order 15 bits would be used on the Cache SRAM memory.    But that would imply that every time the cache is loaded or cleared, it is loaded with 4 words at a time.     

Thoughts?

There's no extra logic on the NuBus PowerPC cache, so it's limited to the comparators built into the cache RAM, and I guess the Bus Signal management is handled either in the chipset or on the PowerPC.   On a IIci cache, logic for halting the bus while the cache housekeeping is done must live on the cache.

Interestingly, on the cache for a PCI PowerPC Macintosh, there's just two sets of plain old SRAM.    One set is clearly used as TAG, I guess, but it's not TAG RAM.  It has no internal comparators.    I'm not sure how that's working.

 
Last edited by a moderator:
Top