Testing a 6200 and comparison with 6100

The 040 bus clock is definitely going to be a multiple of the PPC clock... anything else makes the bus adaptation logic much more complicated than it already must be. As Phipli said apple almost certainly specified the ASIC for operation at 40mhz speeds but what configuration (ie. Programmable wait states) is required to support that remains to be seen. Again recommend referencing the memc documentation for some examples of how different DRAM grades work into 040 bus cycle timing. Something like 4-2-2-2 is more likely; 2-1-1-1 is the minimum cycle and difficult to achieve.

The 60mhz clock would be for the valkyrie private framebuffer DRAM only; that being uncoupled from the bus clock would be normal and expected. Similar to what the Epson chip on my 30video cards does, this should run as fast as practically supported by the DRAM in order to maximize bandwidth.
 
Hi folks,

OK, I've written the test application. It's not very big. Source code and application are included. The results take a bit of interpreting.

My PB1400c is a 603e Mac (as we all know). So, it has a 4-way x 16kB, Write-back L1 Cache. It also has a 128kB L2, Write-Through cache. So, I need to test up to "8 sets" in my test to force a flush to L2 (which also forces a flush to what's called the I/O bus on the PB5300/PB1400, which is kinda equivalent to the '040 bus on the 6200). Hence my version of the application is different to the one for the P5200/6200.

Mine's also written in CW11 Gold, I don't know how easily that converts to the later versions people tend to use. Here are my results and my interpretation of them:

TestTicks/LoopTicksRemCountBandwidth (MB/s)
1Set1298489357871134217728134217728*4/1048576/((93+(129848-57871)/129848)/60)=328MB/s
2Sets129738969139134217728134217728*4/1048576/((96+(129738-9139)/129738)/60)=317MB/s
4Sets1294219895702134217728134217728*4/1048576/((98+(129421-95702)/129421)/60)=313MB/s
8Sets129780627776120971522097152*4/1048576/((62+(129780-77761)/129780)/60)=7.69MB/s

So, what we see here is a demonstration of how great L1 cache is and the dramatic difference between L1 cache memory and the main I/O bus on a PB1400C. But are the L1 cache values realistic? Well, the PB1400c runs at 166MHz. 328MB represents 82M x 32-bit writes/s, equivalent to about 2 cycles per store instruction, which is probably correct. It also looks like there's a slight penalty for accessing different sets, which is interesting.

And my guess, is that the poor main RAM performance is because the PB1400c's bus is just 32-bits and uses pseudo-static RAM. 7.69MB/s is 1.9M 32-bit bus cycles per second for an average of 520ns per bus cycle. I mean, that's bad huh?

The version supplied will go through up to 8 sets and wait for you to press the mouse button. When you do, it'll check the Valkyrie VRAM test, writing directly, as I believe, to the VRAM addresses beginning at $F9000000. And this could be totally wrong, because for all I know those are its I/O regs so it'd be a major disaster ( @noglin ... are the I/O regs there or the screen memory itself). Perhaps I should just take the address of ScreenBits, then it would also work for my Mac. So, it's probably best to Restart the Mac instead of pressing the mouse button the first time, i.e. don't even press the mouse button once, just perform a physical Restart. The next version will have a real event loop, not a wait for Button()!.

I intend to build and submit a version that should work for the 630 at some point too. Still, this version should be useful for comparing your P6200 with my PB1400.
 

Attachments

For your delight, I've updated the very crude app to support writing directly to the framebuffer. I did it by dereferencing the Pixmap handle from the CGrafPtr for screenbits. This version of the app should therefore work with a P6200 and if recompiled, for a P630.

TestTicks/LoopTicksRemCountBandwidth (MB/s)
VRam129758603744620971522097152*4/1048576/((60+(129758-37446)/129758)/60)=7.9MB/s

It's pretty much as slow and I'm still fairly surprised about that. The UI is still pretty awful, it does the first 4 tests and tells you to click, then it does the video test and then you need to click to finish.

68K version coming up in a few hours (it's written, but must dash).
 

Attachments

And finally, the 68K version! These are the results I get for that:

Mc040BusTest68K.jpg
For 68K version running on my PB1400, I get slower results.

TestTicks/LoopTicksRemCountBandwidth (MB/s)
L13640287155626710886467108864*4/1048576/((87+(36402-15562)/36402)/60)=175MB/s

I've just included the normal L1 calculation, because the other calculations that fit in cache are basically the same. It's pretty good, 175MB/s is 53% of the performance of the PPC version. I didn't optimise for 68020, because on a 68040, 68000 is probably faster and I didn't have an '040 optimisation option.

We can guess what the performance of a 6200 for the 1 and 2 sets tests will be. It ought to be simply be 75/166=148MBs, a bit slower than my 68K emulated version! The VRAM test did actually affect my screen, so I think it was a correct mapping! And it ought to be uncached too, because it's using the actual VRAM addresses.

Let me know your test results :-) .
 

Attachments

Back
Top