Testing a 6200

Phipli

68040
I wrote this mostly over a year ago but haven't clicked post. I wanted to do a bit more, but things have conspired against it so I thought I'd share.

I'm actually nervous posting it because people are so passionate about these machines, one way or the other.

I tried to do testing as fairly as possible to see where the performance differences were, and experimented to see where improvements could be made.

The biggest thing I learnt along the way is that SpeedDoubler is a mandatory install for beige PPC macs if you run any 68k code on them at all.


@cheesestraws @joshc @Snial @Cory5412
 
I wrote this mostly over a year ago but haven't clicked post. I wanted to do a bit more, but things have conspired against it so I thought I'd share.

I'm actually nervous posting it because people are so passionate about these machines, one way or the other.

I tried to do testing as fairly as possible to see where the performance differences were, and experimented to see where improvements could be made.

The biggest thing I learnt along the way is that SpeedDoubler is a mandatory install for beige PPC macs if you run any 68k code on them at all.


@cheesestraws @joshc @Snial @Cory5412
Reading this now :-) ! Since @noglin is a P5200 fan, I thought I'd notify them here! OK, you mention Noglin later.

Based on my analysis of disk partitions on my PB1400c, as standard it will come with the 68K IDE driver, but Mac OS 8.1 and later will install the PPC driver (on a different driver partition).

Suggested edit: "Based on in which benchmarks the Performa 6200/75 performs least we’ll compared to the PM 6100/66", I think "we'll" is a typo and should be ", well".
 
Last edited:
@Snial

The main issue seems to be graphics speed based on testing of this specific machine.

Given the 630 is faster, and tests show that it isn't due to 68k driver code / quickdraw code, the hardware is clearly capable of more.

Possible issues are the number of wait straights they had to add for the increased bus speed, or something nasty to do with the bridge. What did I see that might be a clue - it is good at writing images to the frame buffer compared to the 630, but bad at translating an image in VRAM.

Given they had a few issues early on, I wonder did they detune the video performance heavily to avoid crashes? Bad performance over stability issues?

I'd love to compare a release day ROM to see if graphics performance is better.

Also, I hadn't thought before, but I wonder what it would be like with an LCPDS video card?

Also, how differently does a 6300 (5300) perform? How about when underclocked to 75MHz?

So many questions.

Regardless, the results weren't what I was expecting from what I've previously heard from either advocates or detractors.
 
Suggested edit: "Based on in which benchmarks the Performa 6200/75 performs least we’ll compared to the PM 6100/66", I think "we'll" is a typo and should be ", well".
Yeah, that's a phone autocorrect - it is been a nightmare lately. It keeps editing valid text like changing "of" to "if", but only after I look away and press space. It is also frequently breaking words that start with a more common word by forcing a space mid way through the word. I strongly feel auto correct has got much worse over the last 15 years. It tries to be smart and is worse for it.
 
The main issue seems to be graphics speed based on testing of this specific machine.
So, if @noglin pops up on this thread, then they probably have the best insight.
Given the 630 is faster, and tests show that it isn't due to 68k driver code / quickdraw code, the hardware is clearly capable of more.

<snip> wait states they had to add for the increased bus speed, or something nasty to do with the bridge. What did I see that might be a clue - it is good at writing images to the frame buffer compared to the 630, but bad at translating an image in VRAM.
Sounds plausible. I thought I was going to find a 68040 bus signal diagram here:


And I'm sure I've seen it before, but I couldn't find it. The P5200/6200 need 80ns RAM, but I guess the '040 bus runs at 37.5MHz? There's some extensive discussion I'm sure you're aware of here:


Noglin suggested that the graphics slowdown is due to the write buffers on the PPC/'040 bridge. Graphics writes likely use stfd instructions, which write 64-bit values, but these get broken down into 2x 32-bit writes, so even a single write fills 2 write buffers. Hence it functions as a bottleneck. He did numerous graphics tests which identified huge bus latencies depending on how much data was being shuffled and processed between caches and the '040 bus.

<snip> compare a release day ROM <snip> LCPDS video card? <snip> 6300 (5300) perform [75MHz]? <snip>
oK.
Regardless, the results weren't what I was expecting from what I've previously heard from either advocates or detractors.
I guess these machines are contentious, because they were pioneering consumer PPC Macs, leading to Workstation level over-expectations for some people (not me, my first experience with a P5200 I felt was awesome vs my LC II :-D ).
 
I guess these machines are contentious, because they were pioneering consumer PPC Macs, leading to Workstation level over-expectations for some people (not me, my first experience with a P5200 I felt was awesome vs my LC II :-D ).
They're excellent spreadsheet machines. Especially if you're doing lots of FP and the version of Excel supports the PPC FPU.
 
@Snial - I should probably share this here too for background, although I think I showed you specifically before :

 
@Snial - I should probably share this here too for background, although I think I showed you specifically before :

Thanks, I've read it again. PowerPC had a lot of different microarchitecture choices, which can significantly affect performance in conjunction with a complete system design. The PPC603 is significantly different to the PPC601.

1770376338402.png1770376490920.png

The most basic differences (apart from the support for POWER instructions on the 601) is that the 601 has a 256b path from the cache to the 8-entry Instruction Queue, so it can fill all 8 entries if possible. The 603 only has a 64-bit path to a 6-entry IQ, so at best it can only fill 2 entries. The 601 can dispatch 3 instructions per cycle, but the 603 can only dispatch 2 (though the BPU interfaces directly to the instruction fetch, allowing some branches to be annulled).

I believe completion must always be in-order for both. There are other aspects which limit PPC603 throughput. The PPC603 in fact closely resembles the MC88110 which came out a few years earlier:
1770377456334.png
The responsibilities of the IU and FPU were changed, but the rest is similar. It looks to me like the MC88110 team negotiated compromises in order to get the PPC603 to market on time: drop some functional units, re-partition the ALU / FPU datapath, decode the new instruction set.

And PowerPC designers seemingly failed to consider one aspect of RISC vs CISC. RISC was originally designed for Unix workstations, where the expectation was that executables would normally be recompiled for a given machine. And that's normal there: e.g. when playing with the MAME SparcStation 1 emulator, part of the standard install procedure is to recompile the kernel. But consumer-based computers aren't like that: the users generally don't have compilers, so the software can't be recompiled (e.g. the need for the 68K emulator, which RISC workstations never needed). Therefore microarchitecture choices have a much bigger impact. e.g. Cyrix and AMD Pentium 1 competitors suffered because their choices must have looked good on paper (e.g. SpecInt), but worse in reality; whereas the Pentium was basically 2x 486's strapped in parallel, with an FPU that was more independent.

Re: FPU, did the PPC601 has the same or equivalent fused MAC instruction optimisations in the FPU? That could explain the 603's FPU competitiveness. But also, as an aside where you said the P6200 is good for spreadsheets, it's a bit like how the Sinclair QL, which is normally about 2x slower than a Mac 128K thanks to the QL's 68008 CPU, can nearly match the Mac's math performance, because that's mostly internal cycles, which is the same.
 
But consumer-based computers aren't like that: the users generally don't have compilers, so the software can't be recompiled (e.g. the need for the 68K emulator, which RISC workstations never needed). Therefore microarchitecture choices have a much bigger impact.
Yes, I've wondered about this. I've seen some software note that it contains optimisations for the 604 (I think I've seen Photoshop mention it), but most PPC software will have targeted the 601, or a core of commands. Any extra opcodes or optimised tricks will have been mostly left out.

Do any period Macintosh development environments have switches to target, or even optimise, for specific types of PPC?

In 68k land you seem to mostly get the choice to either target 68000 or 68020 in CodeWarrior, although I haven't really dug into the specifics as my programming is rarely speed critical.
 
Back
Top