• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

Fantasy M88100 Macs

Phipli

Well-known member
Some spreadsheet-y thoughts. Stupid assumptions highlighted. I don't know if there are any RAM wait states, I know I've read something about ROM access in the past but forget, and I'm not sure how many clock cycles a 32bit read takes, or if memory reads need to be aligned in some way or whatever. What I mean is I'm not an expert and not familiar enough with the SE hardware.

1709820426617.png
 

Phipli

Well-known member
I know I've read something about ROM access in the past but forget
Ahha

1000016843.jpg

Interestingly RAM is quoted at an average of 3.22MB/s, so ROM is significantly faster. Basically because of what we're talking about? I suspect I can verify/correct my numbers by the 3.22/3.92 ratio? I'll see.
 

Phipli

Well-known member
If I assume the difference in access speed between RAM and ROM is purely due to the video circuit, it implies that...

On the SE, the video would be consuming 45% of memory accesses.

On the Plus, the video would be consuming 70% of memory accesses. (The book says every other access to RAM that occurs during the time of a horizontal scan line to the screen is used for video)

The Plus' average memory read speed is 2.56MB/s

The book says that for the SE, video uses one long word for every 4 memory accesses and for the Plus every other access to RAM that occurs during the time of a horizontal scan line to the screen is used for video. This doesn't line up with my punt above, so something else slows RAM down vs. ROM too.
 
Last edited:

Snial

Well-known member
I.... Think it could possibly still read ROM? I think I read that. But I haven't seen it while reading just now. I'll let you know what I find.

Do you know how long it would take to read two 16bit consecutive words from RAM?
I wasn't quite sure if the Mac SE RAMs supported page-mode, but it looks like all the classics did:


@bigmessowires has a good post on this:


1709829226457.png

So, /AS just before S2 and data is latched at the end of S6, which gives 4/7.5= an amazing 533ns to read RAM. So, 150ns RAM is easily fast enough for normal access. The BMOW post also confirms the 75/25 CPU/Video access split. It also means that 300ns is enough to read two words from 150ns RAM, but the FPM RAM ought to be able to read consecutive words at 75ns intervals, meaning a total of 150ns (ROW+COL access)+70ns (COL access) = 220ns. So recalculate: 60.15Hz x 512 x 342 / 32 x 4 = 1.3M, or 17.5% ( effectively 6.2MHz).

Real cycle counting is fairly complex even on a 68000.


So, to figure out what the real CPU loss is we need to know:
  1. The MIPS/MHz (which is 0.175 as @Phipli you've said earlier).
  2. The average number of bus cycles in a 68000 instruction. Maybe this can be estimated from the average length and the average number of memory fetches. The MIPS difference tells us the number of internal cycles.
  3. The average proportion of time spent executing ROM code on a classic Mac. Currently unknown.

With info we'd get a more accurate answer.
 
Last edited:

Snial

Well-known member
Sorry, I was rude. It's been a long week but I shouldn't have taken that out on you.
No worries, you could be right! Attributing motives (e.g. PowerBook naming) from a lack of information is how that kind of nonsense starts! Hope your week improves! Cheers from Julz
 

Arbee

Well-known member
Quoting from the Freeport ERS:

On Macintosh 512K and Macintosh Plus, during the display of a horizontal line on the video screen the CPU and the video are given alternating accesses to RAM, so that the CPU's use of RAM slows to 50% of the maximum rate. On Freeport every other video access point is given back to the CPU, and the video takes a double word at each remaining video access point, to make up. This gives the CPU three accesses to every one for video during a horizontal line, and the CPU can run at 75% of the maximum rate. This results in an average increase in overall speed of approximately 16%, more for RAM only tasks.
 

Snial

Well-known member
NuMac Model Information Update!

My fictitious sources have given me an update on some M88K NuMac product details.

First, the CPU. Apple would probably have influence on the part development. Importantly, for the low-end models I'd specify a cut-down 25MHz CPU with 4kB each of Instruction and Data cache: the 88100EC25 (about 540K transistors, 25MHz, 2.25W).

Secondly, the models I'd drop the 'better' from the good, better, best principle, leaving only the NuMac41 (an LC-sized NuMac with the cut-down CPU for Math/Science students) and the NuMac 61 (a Q700, 3-NuBus slot with the 33MHz, 3 Chip, chipset for developers and ultra-high end graphics). The NuMac41 would be priced similarly to an LC III, but due to the small cache really, only perform like an 8MHz SE under emulation.

Thirdly: 32-bit QuickDraw performance. Estimating this is a bit tricky. I have a reference to a MacTech article, which explains that 32-bit QD was rewritten in 'C' for PowerPC and was much faster, but the images are missing from the archive. So, then I had a look at some MacBench and Speedometer tests:
1710020151194.png
But this isn't very helpful. I estimated that an M88100 at 25MHz would run at 0.28 x the speed of PPC 601/60MHz, but this means graphics would run at 0.43 on the Speedometer scale - much slower than a Q700. So, I think the tests are misleading here, because they're probably still driven by a lot of 68K code. I think it's better to recalculate based on SpecInt92. The MC68040/25 had a SpecInt92 of 13.5; whereas the M88K at 25MHz had a SpecInt92 of 17.4. So, in theory, then its Graphics performance would be: 0.97.

However, since PowerPC 32-bit QuickDraw was written in 'C', probably the M88100, 32-bit QD would be too - but I'll assume 80% C and 20% Assembler on the 80:20 rule. If I assume compiled 'C' is about 3x slower than assembler, but 80% of the time is in the assembler code, this gives us an effective performance of : 0.333x 0.2 + 0.8 = 0.87 x the ideal speed making it: 0.84.

OK, furthermore I don't think the 4kB caches would make that much worse, because most graphics operations hammer the cache anyway, and most of the inner core code would fit in the 4kB instruction cache, so I'll keep it at 0.84.

That's encouraging, because for tuned M88100 applications, a NuMac R41 would give graphics performance better than a Q700, for the price of an LCIII (whose graphics performance is 0.32). For the NuMac61, the performance would be 1.1 and only a Q650 beat that.

Fourthly: One of the main issues with the early PowerPC development was the compilers, with Metrowerks saving the day. Maybe MW could have produced a decent M88K compiler by 1992, but I think it's better to assume most students and initially even most developers wouldn't have access to that. They'd either still be running THINK C (or THINK Pascal) or MPW C or Pascal. Only Apple would have access to Motorola's M88K compilers.

So, I think the easiest solution there is for Apple to provide pre-built libraries for M88K accelerated functionality: 3D graphics, matrix math, statistics, graphics filters etc. I think it's fairly safe to assume that maths performance for the M88K is more like PPC 601 math (it didn't have a MAC, but it had two Execution Units, one for Multiplies and the other for addition, so it could have competed). We can expect 44 for Math.

Function headers and Units would come with the library code, and developers would literally just add CdeR resources to applications. The NuMac Mixed Mode Manager would pull in CdeR resources when emulated code attempts to access M88K functions in the jump table.

And this concludes the latest fantasy M88K ramblings!
 

Phipli

Well-known member
So, I think the tests are misleading here, because they're probably still driven by a lot of 68K code.
Here are Norton System Info benchmarks for Quickdraw video on the 700 and Nubus PPC, all at 256 colours.

20240309_224323.jpg

I think your conclusion, assuming the architecture is constant and the bus ratio remains the same, that the 88100 based PowerMacs would have slow graphics is probably true if they were the same. The Q700 was fast though. The video speed on the 601s just didn't scale proportionally to the CPU improvement between Q700 and 6100.

20240309_224812.jpg
 

Snial

Well-known member
Here are Norton System Info benchmarks for Quickdraw video on the 700 and Nubus PPC, all at 256 colours.

View attachment 70916
Amazing, you've just run it on your real 9500/233? Nevertheless I'm a bit puzzled (maybe due to the glass of wine I've just had) - how can the 9500/233 have slower graphics than a PM8100 - still less than a PM6100? Really curious. the 9500/233 has PCI graphics, or internal video, but the 6100 is using DRAM as video. My head is melting ;-) !

I think your conclusion, assuming the architecture is constant and the bus ratio remains the same, that the 88100 based PowerMacs would have slow graphics is probably true if they were the same. The Q700 was fast though. The video speed on the 601s just didn't scale proportionally to the CPU improvement between Q700 and 6100.

View attachment 70917
Cool, thanks for that! I think that's what makes the NuMacs really intriguing: for normal applications they'd be pretty poor, primarily because there wouldn't be a sufficient performance advantage in a mixed emulation/native environment in the 1989 to 1992 timeframe. Before doing this guestimation, I kept thinking that the NuMac41 would feel quite nippy despite the emulation hit, because the graphics would be so much faster. This analysis implies that the emulation mode switching would hit UI graphics quite badly, but we'll see in the end it doesn't!

So, if I recalculate this (all hypothetical of course, massive error bars): The MC68040/25 had a SpecInt92 of 13.5 and the graphics performance of a Q700 is 0.75. The PPC 601/60 had a SpecInt92 of 61.6, so its graphics performance should have been: 0.75 x 61.6/13.5 x 0.87 (80:20 rule) = 2.98, but in fact it was 1.51, just over 50% of what it should have been. So some of this is because of the VRAM and RAM bandwidth (but the PPC 601 had a 64-bit data bus, which would have helped), but much of it is the emulation speed and mode switching. Fortunately Speedometer can provide performance ratings for both emulation and native execution:

1710029300360.png

So, emulation is 6 x slower than native execution (pretty good actually). But graphics under emulation is still 97% of the performance (which is 50% of the ideal performance). This means, I think that although emulation is a significant hit, emulation + mode switching for the graphics isn't a big hit on top of that, instead as you've concluded, it's a bus issue. And yes, if you just take the Q700 graphics performance 0.75 and double it (because we have a 64-bit bus), we get 1.5 which would account for all of that.

So, backtracking, the NuMac 41 runs at 25MHz, with a 32-bit bus, so it wouldn't suffer a bus speed issue vs an LC III, which also runs at 25MHz with a 32-bit bus. An emulation hit of 6x results in a 3% loss of performance for graphics, but the NuMac 41 has something like a 10x hit vs Native speeds, so maybe a 5% loss, making it 0.84 x 0.95 = 0.80, still slightly faster graphics than a Q700.

So, surprisingly, NuMac 41 graphics in most cases would be quick. It'd boot a bit quicker than a Mac SE from SCSI (much slower than an LCIII); windows and menus would react sluggishly on System 7.1, but draw very quickly. E.g. you'd pull a menu down, there would be a lag, but the menu would appear in a single frame. You'd drag a window (outline) around; see fairly big steps between each redraw, but each redraw would be instantaneous. You might find that if you typed quickly in a word-processor it might struggle to keep up with the typing, but the characters and text would snap onto the screen etc.
 

Phipli

Well-known member
Amazing, you've just run it on your real 9500/233?
They're photos from a little while back.
Nevertheless I'm a bit puzzled (maybe due to the glass of wine I've just had) - how can the 9500/233 have slower graphics than a PM8100 - still less than a PM6100? Really curious. the 9500/233 has PCI graphics, or internal video, but the 6100 is using DRAM as video. My head is melting ;-) !
The 9500 is using an absolutely dog slow Radius PCI video card (they don't have any internal video at all). I was testing /how/ slow it was 😆
 

demik

Well-known member
If I assume the difference in access speed between RAM and ROM is purely due to the video circuit, it implies that...

On the SE, the video would be consuming 45% of memory accesses.

On the Plus, the video would be consuming 70% of memory accesses. (The book says every other access to RAM that occurs during the time of a horizontal scan line to the screen is used for video)

The Plus' average memory read speed is 2.56MB/s

The book says that for the SE, video uses one long word for every 4 memory accesses and for the Plus every other access to RAM that occurs during the time of a horizontal scan line to the screen is used for video. This doesn't line up with my punt above, so something else slows RAM down vs. ROM too.

Memory refresh ? and probably minor but sound
 

Snial

Well-known member
If I assume the difference in access speed between RAM and ROM is purely due to the video circuit, it implies that...

On the Plus, the video would be consuming 70% of memory accesses. (The book says every other access to RAM that occurs during the time of a horizontal scan line to the screen is used for video)

The Plus' average memory read speed is 2.56MB/s
OK, so the ideal rate is 3.75MB/s (7.5MHz clock/4 clocks per word x 2 bytes per word) = 3.75MB/s. Video will consume 512/16 (bits per word)x342 x 60.15 = 658281.6words/s = 1.32MB/second. 3.75-1.32 = 2.43MB/s left. So, that's actually lower than the reported 2.56MB/s. Curious. And it will be slightly lower than the 2.43MB/s in reality, because of the audio/disk stepper phase buffer being read, another cycle per scan, removing another 1/64th: 0.02MB/s => 2.41MB/s.

Mac SE only consumes half of that, because essentially it consumes long-words during the 4 cycle access time. So, SE bandwidth = 3.09MB/s.
 

Phipli

Well-known member
OK, so the ideal rate is 3.75MB/s (7.5MHz clock/4 clocks per word x 2 bytes per word) = 3.75MB/s. Video will consume 512/16 (bits per word)x342 x 60.15 = 658281.6words/s = 1.32MB/second. 3.75-1.32 = 2.43MB/s left. So, that's actually lower than the reported 2.56MB/s. Curious. And it will be slightly lower than the 2.43MB/s in reality, because of the audio/disk stepper phase buffer being read, another cycle per scan, removing another 1/64th: 0.02MB/s => 2.41MB/s.

Mac SE only consumes half of that, because essentially it consumes long-words during the 4 cycle access time. So, SE bandwidth = 3.09MB/s.
That assumes that only single word accesses are performed. Consecutive reads/writes would lift the average.

You've included it by working out the data transfer for the screen, but for interest / usefulness, the percent of time "in screen" where RAM is shared (again, ignoring the Audio/Floppy word) is 67.2% of the time, so the CPU has about 32.8% of the time where it can access RAM on its own.
 

Snial

Well-known member
That assumes that only single word accesses are performed. Consecutive reads/writes would lift the average.
Apologies, I don't yet understand that. I thought that on a 68000, every RAM access, for words and long words would take 4 cycles per 16-bit access as that's how the bus works. So, consecutive read/writes by the CPU can't raise the average - it's still 3.75MB/s (ignoring video contention) and video access on the Plus is still going to be one 16-bit word every other 4 cycle access in the visible portion of the scan line (regardless of whether the Plus needs to access RAM or not). The SE is different, as described below.


Good find:
That makes sense. There's 2 column accesses by the video in every 4 clock cycles on an SE (I'm presuming it's SE info here). It's interesting how C16M is available on the PDS bus, a handy (roughly) 16MHz signal!
 

Phipli

Well-known member
Apologies, I don't yet understand that. I thought that on a 68000, every RAM access, for words and long words would take 4 cycles per 16-bit access as that's how the bus works. So, consecutive read/writes by the CPU can't raise the average - it's still 3.75MB/s (ignoring video contention) and video access on the Plus is still going to be one 16-bit word every other 4 cycle access in the visible portion of the scan line (regardless of whether the Plus needs to access RAM or not). The SE is different, as described below.
Oh my bad, I assumed reading multiple values was more efficient like on some other CPUs. I'm not familiar enough with the opcodes / timings.
 

Phipli

Well-known member
That makes sense. There's 2 column accesses by the video in every 4 clock cycles on an SE (I'm presuming it's SE info here). It's interesting how C16M is available on the PDS bus, a handy (roughly) 16MHz signal!
Yeah, Freeport was the SE's codename.
 
Top