Nice. I had a pretty good idea of L1 but I had no understanding of the L2. OK so any address with overlap of the low 17 bits will map to same index in L2, and Capella will kick out the LRU and store the upper other 15 bits in the tag ram (and update the LRU). This is very useful information...
Hey Snial,
I checked a bit more in the manual for the 603. Let's first just ignore L2:
A cacheable read, that is not in L1, the following will happen:
1. cpu executes the load, the MMU will say it is cacheable, and that it is not in L1. So a cache burst read will be requested on the bus...
Especially exploit its capabilities :D and an understanding of what *should* be possible gives me an idea of how to best do that, or what to avoid that is slow.
Actually, the 603 bus on the 5200 is running at 37.5 MHz (a multiplier of 2). One way to prove this is to measure the div instruction...
I use inline asm in CW Pro 5 and up. It took me a while to figure out how to get inline asm: you have to turn off ”strict ansi C” in the project settings, and then you can use ”asm { .. }” blocks in functions. Might work for CW 8 as well.
P.S nice work with MiniVNC 👏
Is it the 32 cycles you find scary? Or the post? :D
Nice, although I'm not convinced (yet). While the numbers match the empirical values, it is assuming that the RAM is the bottleneck and that a full line must be read into L2, while the bus and cpu has a "critical word first". Also, if RAM was...
Hey Julz!
Oh yes, I was referring to data cache, I thought you were as well 🙃
The best source. My empirical benchmark of the L2 cache on a 5200 :D
https://68kmla.org/bb/index.php?threads/how-does-the-l2-cache-work-on-nubus-powerpc.47395/
It is a 168 cycle hit only on the first attempt to...
On 603 (on 5200), L1 is 2 cycles, L2 is 8 cycles, and system ram is 32 cycles *if* an entry is already in TLB, if not it is 168 cycles(!). Sure, that is also partially because Apple used a 32 bit bus to the system ram, maybe it could have been 16 cycles, but that is still 8 times slower than...
Tested with CW Pro 1 which is from 1997, way poorer results than with CW Pro 7 (except for fpu). In both cases I enabled all optimizations, and set to optimize for the 603 specifically.
These results are quite surprising to me. Either CW is a much better compiler than gcc-2.95.4, or the 603 at...
I just did now. I took the code, used CW7.1 and set the global optimizer to max, targeted specifically 603 and enabled all optimizer features.
I'm somewhat surprised. I'm not sure this is apples to apples. It is odd it is ahead of 601 by that much, and it is odd that it does better on integer...
Interesting read! There are some things to be added for the 603/603e:
- the 603e can execute add/cmp in both the SRU and the ILU, while the 603 can only execute these in the ILU
- for the 603/603e, one should also consider the completion queue, with 5 entries (program order), in which only 2 can...
I've the 603e user manual, with supplements for 603, and that should contain all the differences between 603e and 603.
Regardless, I would like to have the original 603 user manual :) Is there anyone who might have it as pdf (my web search has not been fruitful), or perhaps would be willing to...
That's a very nice resource. Thanks! Especially that the source code of the benchmark is there. I will try (eventually) to port it so I can run it on Mac OS on a 5200.
Once I have results will update and add it it. Probably it will be close to how the 601 performs (relative changes): 486DX2@66...
Thank you! That is an amazing reply.
Found two sources to complement:
- 66 Mhz 603 was using 256 L2 (SPECint92 reports at 60.6 here consistent with your table): https://netlib.org/performance/html/spec.mo603662.cint92.3_95.notes.html)
- P100 (P54C): SpecInt: 100, SpecFp92: 81...
Are there (reliable?) benchmarks that compare these CPU:s or these machines?
I think most likely the PPC 603 performed quite a bit better than a 486DX2/66 P24D.
Against the Pentium 75 Mhz I'm less sure. In real world use cases the P75 would likely be quite a bit faster due to the bus and due...
@joevt: reverse engineering it and making your own driver based on it, that's very cool. Kudos. Out of curiosity, if you don't mind me asking, how did you go about that? I suppose MacsBug and looking at the driver entries with some custom made macros for mapping records/structs and reading disasm?
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.