• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

History of the PowerPC 603 and 603e

bigmessowires

Well-known member
In Wikipedia's article about the PowerPC family, I noticed this about the 603:

Apple tried to use the 603 in a new laptop design but was unable due to the small 8 KB level 1 cache. The 68000 emulator in the Mac OS could not fit in 8 KB and thus slowed the computer drastically. The 603e solved this problem by having a 16 KB L1 cache, which allowed the emulator to run efficiently.

This makes it sound like the plain 603 (non-e version) basically can't run the 680x0 emulation layer, but if that's the case, how did Apple use the 603 in computers like the Performa 5200 and 6200 series? Is that why those computers have a reputation for being dog slow? Is the 603 really slower than the 601 that preceded it? If so, my Performa 6200/75 here is sad.
 

Phipli

Well-known member
This makes it sound like the plain 603 (non-e version) basically can't run the 680x0 emulation layer, but if that's the case, how did Apple use the 603 in computers like the Performa 5200 and 6200 series? Is that why those computers have a reputation for being dog slow? Is the 603 really slower than the 601 that preceded it? If so, my Performa 6200/75 here is sad.
The implication is that not enough of the emulator is able to stay cache resident, meaning that emulation experiences a slowdown Vs a similar machine with larger cache.
 

Snial

Well-known member
In Wikipedia's article about the PowerPC family, I noticed this about the 603:



This makes it sound like the plain 603 (non-e version) basically can't run the 680x0 emulation layer, but if that's the case, how did Apple use the 603 in computers like the Performa 5200 and 6200 series? Is that why those computers have a reputation for being dog slow? Is the 603 really slower than the 601 that preceded it? If so, my Performa 6200/75 here is sad.
Because the 5200 and 6200 had 256kB of L2 cache!
 

Phipli

Well-known member
Is that why those computers have a reputation for being dog slow?
Also, yes to this, but it isn't entirely fair - it's partly a review on release issue. There wasn't a huge amount of PPC software, so emulated performance was important. Retrospectively, just install Mac OS 8.6 and FAT or PPC software, and make sure you have enough RAM and they run as well or better than a 6200 (depending on CPU speed).
 

Melkhior

Well-known member
This makes it sound like the plain 603 (non-e version) basically can't run the 680x0 emulation layer
As @Phipli said, it's mostly a performance issue. All emulation layers have a overheads, and needs additional space in the D and I cache to be able to do their job. Ideally, they can be full resident as close as possible to the core (basically, L1), plus extra space to actually usefully cache whatever it is the emulated code is doing. The 603 L1 was flat-out too small, the 603e was more acceptable and coming later also benefited from a larger amount of PPC as @Phipli also mentioned. Adding L2 also mitigate the issue. 601 was unified instead of split I/D and that helps a bit with emulation, as emulator can produce code directly in the innermost cache w/o relying on coherence as split D/I caches design needs to (assuming D/I are coherent, otherwise it's even worse as some amount of cache flushing is required, see e.g. most Aarch64 implementations, which doesn't help Java & co).

Other than that, the 603 and in particular 603e were pretty good designs, and were more the basis of the 750 (G3) than either the 604 or the 620 (which Apple never used). The 601 was an early implementation and could be described as "quite characterful" (read: a bit of a weirdo), but surprisingly good as well - and still my personal favorite :)

The cache issue for emulation has some interesting historical side-effects: when hardware is designed to primarily be used for emulation (e.g., Transmeta Crusoe or NVidia Denver), designers end up needing larger L1 to compensate for that effect. L1 that takes up space, which cannot be used for something else, so it's already costly. And a larger L1 tends to need higher latency, which is a major performance issue. If someone designs a CPU with a L1 that's significantly larger than whatever the contemporary designs are using, then usually it's a sign something is inherently flawed in the design - adding a cycle or two of latency on a L1 is a big no-no, and is only done when there's an even bigger problem to solve.
 

Phipli

Well-known member
The 601 was an early implementation and could be described as "quite characterful" (read: a bit of a weirdo), but surprisingly good as well - and still my personal favorite :)
I'm also a 601 fan. Crazy thing was basically a POWER chip and a PowerPC in one.

I have an 8200 with the 120MHz 601 :)
 

Snial

Well-known member
As @Phipli said, it's mostly a performance issue. All emulation layers have a overheads, and needs additional space in the D and I cache to be able to do their job. Ideally, they can be full resident as close as possible to the core (basically, L1), plus extra space to actually usefully cache whatever it is the emulated code is doing. The 603 L1 was flat-out too small, the 603e was more acceptable and coming later also benefited from a larger amount of PPC as @Phipli also mentioned. Adding L2 also mitigate the issue. 601 was unified instead of split I/D and that helps a bit with emulation, as emulator can produce code directly in the innermost cache w/o relying on coherence as split D/I caches design needs to (assuming D/I are coherent, otherwise it's even worse as some amount of cache flushing is required, see e.g. most Aarch64 implementations, which doesn't help Java & co).
<snip>
The cache issue for emulation has some interesting historical side-effects: when hardware is designed to primarily be used for emulation (e.g., Transmeta Crusoe or NVidia Denver), designers end up needing larger L1 to compensate for that effect. L1 that takes up space, which cannot be used for something else, so it's already costly. And a larger L1 tends to need higher latency, which is a major performance issue. If someone designs a CPU with a L1 that's significantly larger than whatever the contemporary designs are using, then usually it's a sign something is inherently flawed in the design - adding a cycle or two of latency on a L1 is a big no-no, and is only done when there's an even bigger problem to solve.
So, in summary, a really great response, covering pretty much everything we need to know about the relationship between caches and emulation. Update wikipedia to include this!!

It's possible a 68K emulator could have been designed a different way. Most of the opcodes on the 68K can be identified from the top 8-bits:


Thus a 256 vector table (using 1Kb worth of pointers) would have easily fitted in the 603's data cache and given that most instructions are the simple ones: move (including load/stores), add, cmp, branch; the implementation of those would also have remained in the instruction cache. The data cache on a 603 is 2 way set associative, so the 256 entry vector table would not so likely be overwritten.

Thinking about a 256 entry vector table makes me wonder if that kind of assembler-coded, Cortex M0, 68000 emulator would be more optimal on a Raspberry PI PICO. It has a 16kB XIP flash cache (2-way set associative). I've been looking at converting the Cyclone 68000 emulator to Thumb, but it might simply be easier to rewrite it this way. Cyclone is impressive, but uses more registers than are easily available in Thumb mode.

Other than that, the 603 and in particular 603e were pretty good designs, and were more the basis of the 750 (G3) than either the 604 or the 620 (which Apple never used). The 601 was an early implementation and could be described as "quite characterful" (read: a bit of a weirdo), but surprisingly good as well - and still my personal favorite :)
I would probably say that the 603 ought to be my favourite PPC CPU, mostly because it achieved pretty close to 601 SpecInt and SpecFP using about half the transistors, much lower energy consumption and was the first true, PPC. The 603e's cache fix bumped it up to 32kB in total, but it was still smaller and cheaper, because cache is denser than random logic.

I've never used a 601 Mac, unless very briefly. That's why I'm particularly keen on seeing the infinite-mac's, DingusPPC 6100 emulator working (with the MMU). It appears to be stalled on SCSI emulation.
 

bigmessowires

Well-known member
I can understand how the small L1 cache on the 603 was a problem for emulation. Basically what I'm asking is why Apple used the 603 for the 5200 and 6200 series after they'd previously rejected the 603 for slowing the computer "drastically" (according to the Wikipedia story), and encouraging IBM to produce the 603e? Was it because the "new laptop design" mentioned by Wikipedia would have had small or no L2 cache, whereas the 5200 and 6200 has 256 KB L2?

Also, yes to this, but it isn't entirely fair - it's partly a review on release issue. There wasn't a huge amount of PPC software, so emulated performance was important. Retrospectively, just install Mac OS 8.6 and FAT or PPC software, and make sure you have enough RAM and they run as well or better than a 6200 (depending on CPU speed).

"They run as well or better than a 6200" - but I thought we were discussing the 6200? Which systems are you comparing? Anyway I take your point to install OS8.6 and run fat binaries and it'll be mostly fine. Mine has 9.something installed on it now, so maybe I should downgrade.
 

Phipli

Well-known member
I can understand how the small L1 cache on the 603 was a problem for emulation. Basically what I'm asking is why Apple used the 603 for the 5200 and 6200 series after they'd previously rejected the 603 for slowing the computer "drastically" (according to the Wikipedia story), and encouraging IBM to produce the 603e? Was it because the "new laptop design" mentioned by Wikipedia would have had small or no L2 cache, whereas the 5200 and 6200 has 256 KB L2?



"They run as well or better than a 6200" - but I thought we were discussing the 6200? Which systems are you comparing? Anyway I take your point to install OS8.6 and run fat binaries and it'll be mostly fine. Mine has 9.something installed on it now, so maybe I should downgrade.
Sorry, 6100. Typo.
 

Phipli

Well-known member
Basically what I'm asking is why Apple used the 603 for the 5200 and 6200 series after they'd previously rejected the 603 for slowing the computer "drastically" (according to the Wikipedia story),
Aren't the 6200 and 5200 older? Perhaps it was the backlash from the 6200 reviews.

Plus as mentioned, they have an L2 cache. Apple didn't tend to put L2 in lower end laptops.
 

Snial

Well-known member
I can understand how the small L1 cache on the 603 was a problem for emulation. Basically what I'm asking is why Apple used the 603 for the 5200 and 6200 series after they'd previously rejected the 603 for slowing the computer "drastically" (according to the Wikipedia story), and encouraging IBM to produce the 603e? Was it because the "new laptop design" mentioned by Wikipedia would have had small or no L2 cache, whereas the 5200 and 6200 has 256 KB L2?
I think it was primarily driven by the product release schedule. The first 601-based PowerPC Macs appeared in March 1994 and by August 1995 there had been at least one upgrade cycle for the 6100, 7100 and two for the 8100; then the 7200 appeared (PCI); pretty much alongside the 7500/8500/9500 machines which included the 604 at the higher end and finally the 603e (without L2) for the PowerBook 5300 at 100MHz.


So my guess is that they were pretty much forced into using the 603 for the 5200 and 6200 Performas, because if they'd waited until enough 603es were available for the low-end home market, then they would have been waiting until October 1995, when the 5300CD/100 appeared: that's 18 months to provide a PPC Mac for every segment.

Having said that, my limited experience with a 5200 (with 12MB of RAM), while I was house-sitting for a friend (and their family) from Oregon, who was visiting Ukraine, was really positive. I did my Manchester Uni application on it, by scanning the paper application form into ClarisWorks 3; generating text boxes for all the text fields; filling them in and finally printing it out after whiting the scanned backgrounds.

The 5200 seemed amazingly fast compared with any Mac I'd previously used, but I was mostly used to my Performa 400 (LCII) by then. But there was no way I could have filled in the form quite in that way - the LC II couldn't refresh the background at a usable rate.
 

3lectr1cPPC

Well-known member
Plus as mentioned, they have an L2 cache. Apple didn't tend to put L2 in lower end laptops.
They also didn't put an L2 cache in the $6,800 PowerBook 5300ce, in which doing so may have made the thing actually feel fast. I'd like to know why they skipped on including one until the 1400/133.
 

Phipli

Well-known member
They also didn't put an L2 cache in the $6,800 PowerBook 5300ce, in which doing so may have made the thing actually feel fast. I'd like to know why they skipped on including one until the 1400/133.
Fast SRAM is expensive, I guess they felt laptops were already expensive and the extra price (plus markup at each stage of distribution and any tax) would have hit sales more that missing out on the speed bump.
 

3lectr1cPPC

Well-known member
Considering the reputation the 5300s achieved as a result, I have to wonder if that wasn't such a good move...
 

3lectr1cPPC

Well-known member
Also, Apple charged a $2900 markup on the 5300ce over the 5300c ($3,900) for an extra 17MHz and an 800x600 LCD instead of a 640x480 one. LCDs were expensive then, but that expensive? It feels like they should have had some room to spare.

I'd say the reputation due to the case is worse than for the cache. Those stupid hinges.
Hey, at least it would have been a fast poorly designed computer instead of just a poorly designed computer!
 
Top