Experiments with G4 external/backside cache

herd

Well-known member
On the G4 CPU, the external memory cache is configured with a register. In the following example a 600MHz 7400 CPU has an actual 2MB of L2 "backside" cache physically existing in hardware, but reduced amounts of 1MB, 512K, and zero L2 cache can also be set in software. This picture is showing a plot of the speed vs data size output from Cache Basher for different settings. The high speed spike at the smallest size is the full speed L1 cache inside the CPU chip, and the slow/flat speed for accessing larger size data is the main RAM at 100MHz on the front side bus (FSB). The jaggy stuff in-between is the L2 backside SRAM cache chips that are external to the CPU but on their own dedicated bus, and of intermediate speed compared to the CPU and main RAM.

So the question is, what types of software can benefit from the external cache? Most simple loops or artificial benchmarks don't show any benefit from larger L2 cache because they fit entirely in a smaller cache. In the Cache Basher plot, I would guess that the jaggy shape of the line is a result of the preemptive multitasking nature of OS X, which is doing a variety of things at the same time by breaking up the CPU time allotted to each activity. I suppose trying two different benchmarks at the same time might show a benefit from larger cache?

While I have this set up for testing, does anyone have ideas on software configurations that might be interesting to try?

basher.png
 

Phipli

Well-known member
On the G4 CPU, the external memory cache is configured with a register. In the following example a 600MHz 7400 CPU has an actual 2MB of L2 "backside" cache physically existing in hardware, but reduced amounts of 1MB, 512K, and zero L2 cache can also be set in software. This picture is showing a plot of the speed vs data size output from Cache Basher for different settings. The high speed spike at the smallest size is the full speed L1 cache inside the CPU chip, and the slow/flat speed for accessing larger size data is the main RAM at 100MHz on the front side bus (FSB). The jaggy stuff in-between is the L2 backside SRAM cache chips that are external to the CPU but on their own dedicated bus, and of intermediate speed compared to the CPU and main RAM.

So the question is, what types of software can benefit from the external cache? Most simple loops or artificial benchmarks don't show any benefit from larger L2 cache because they fit entirely in a smaller cache. In the Cache Basher plot, I would guess that the jaggy shape of the line is a result of the preemptive multitasking nature of OS X, which is doing a variety of things at the same time by breaking up the CPU time allotted to each activity. I suppose trying two different benchmarks at the same time might show a benefit from larger cache?

While I have this set up for testing, does anyone have ideas on software configurations that might be interesting to try?

View attachment 71766
Would you fit a good chunk of common OS routines in a meg of L2 cache?

It would be nice if it captured the Photoshop image you were working on, but VRAM can pick up that job hopefully?

Xpostfacto lets you disable the L2 doesn't it? How does L2 impact boot time? I seem to remember it makes  big differences.
 

herd

Well-known member
This shows testing with sorting arrays of integers. The size of the arrays increase: 16KB, 32KB, 320KB, 3.2MB, 32MB, and 64MB. The plot shows sorting speed relative to results with 2MB of L2 cache. You can see that less cache is always slower, but there is a range of array sizes where the difference is bigger. The worst case is for zero L2, where it can be around half the speed.

sort.png
 

herd

Well-known member
I added data for the 256KB setting. This plot also shows another way to look at the same data. With zero L2 cache as the baseline, this shows the speed increase with different amounts of cache added. So it's like looking at the performance hit for not having the cache, or the performance gain for adding it.

sort4.png
 

herd

Well-known member
The later G4 chips had the L2 cache built in, and the external cache (where available) becomes L3. With this series of chips there is also now the option for DDR SRAM chips, vs the earlier SDR versions. So this is looking at the "7450" G4 chips with different cache configurations, using the 7450 without L3 cache as the baseline (this would be the same as a 7440 chip). For small pieces of data that fit in the internal caches there is no difference between any of them.

L2cache2.png
 

herd

Well-known member
I found an article from Powerlogix comparing DDR and SDR L3 cache:

eshop.macsales.com/images/items/plgsdrvsddr.pdf

They go into a lot of detail, but my takeaway is that if you run software that doesn't use L3 much, then there won't be much difference. Like in the chart I posted above for 7450 series CPUs: there is no difference in any L3 cache setup for the first four points. In cinebench, I can completely turn off the L3 cache and not see much difference. I don't know the details of the Photoshop tests they ran, but most simple image filters run almost entirely in L1/L2 cache.
 
Top