Macintosh 68060 Redux

I think work towards an open 68040-ISA-compatible FPGA softcore makes more sense then trying to do anything with Apollo
Agreed. The MC68060 is nice and all, but the lack of some integer instructions is a problem for old codes. The support package is nice, but unlike the FPU support package, the performance impact is significant [1]. A nice soft-core in a FPGA would be a great thing to have.

But...

The only motorola MMU implementation I know of is not open-sourced (... yet, hopefully) and is for the '030. There's no FPU implementation - I know of two projects including mine (don't hold your breath, haven't touched it in awhile) to do it, but the '881/'882 are complex beasts. Both projects are targeting the '040 subset as it's much easier and could be supported by the '040 FPSP. The '030 soft-core I've used from Suska for my Sun3 project has no open-source caches, no open-source MMU, is quite large, and as it emulates the exact behavior of the '030 (including instruction and bus timing) it is not very efficient in terms of timing closure - even on a Artix-7 speedgrade -2, 20 MHz is not an easy target to achieve :-/ But it works fine otherwise.
A CPU accelerator does not change the underlying nature of the classic hardware
At some point I believe it does. if the CPU is just emulated on something faster (like a PiStorm), why bother with the rest of the system? Once the CPU is emulated, the SCSI disk is emulated, etc., then it's easier to just run Qemu. Faster and more reliable than any vintage hardware with or without acceleration.

FPGA is still "hardware", at least in terms of the way things are designed. If a proper MC68040 was available in a FPGA and validated as a substitute for the real thing, you'd """""only""""" (not sure 5 quotes is enough around that word :-) ) need to simulate various timings and some hardware validation to start thinking about doing a VLSI version of it...Though I suspect it would be easier to do an older design first (the regular 68000 comes to mind).

[1] The FPU in the '040 has much lower latency than the external interface to the '881/'882 (which was a very neat design, but also quite slow so offloading an instruction is only worth it if it takes long enough - tried that with AES instructions for SSH, the large gain of the faster instructions is almost wiped out by the coprocessor interface overhead). Trapping and executing a fast sequence of FPU instructions to replace the rare and ultra-long instructions like transcendentals was a good trade-off. Trapping for integer instructions that were done inside the pipeline of the '020/'030/'040 has a much more visible overhead.
 
This is so cool. What's the fastest 68060 clock speed?
68060 was produced in 50, 66, and 75mhz speed grades. CPU clock may be 1x, 2x, or 4x bus clock. It seems that with a little extra voltage they can do 80+mhz on the earlier parts I have, and supposedly the later masks can do 100mhz.
 
A misc update - not as much Mac centric, but still. I've finally got my hypothetical "Hexdra" machine in a case and all the hardware done. I designed the case as a what-if exercise for this what-if machine: What if there was a snow white styled pizzabox designed like the IIcx? Similar to the LC, but smaller still.

1775835963959.png1775835980747.png
IMG_6013.JPG

Turns out there's a lot of really neat design quirks in apple's cases - like stealth vents hidden in the snow white lines which are largely invisible depending on how you look at the machine. I borrowed that concept in order to draw air over the CPU heatsink and the picoPSU using a laptop centrifugal fan. The case lid slides (or snaps) on and is secured by two thumbscrews. Some front buttons allow access to reset and programmer's switch. I don't know if the central support is enough for a CRT, but it's enough for a 20lb Cat. I need to make some design revisions and perhaps after that I'll print a "final" version. The beige box near the battery is a custom enclosure for a 40x28mm speaker found by @demik - it doesn't sound *terrible* as is, but in the future the speaker needs to be mounted to some vents directly as that provides better sound quality vs anything inside the case. It was an afterthought 😂

The custom boards: A low profile PSU adapter board. One of my ISP-SIMM USB in system programmable ROM simms. A 256K cache + 060 adapter board (more later), and finally I designed a DOM (Disk on Module) variant of the zuluscsi Slim. It can have the IDC50 connector installed in one of 3 orientations to allow direct insertion into the logic board paying obvious dividends here. I'm running the bicolor power/HDD LED off it. Note on PicoPSU - if you're going to use one of these, only use the official products. The quality is *MUCH* better than the chinese knockoffs, better built, run much cooler and regulate much closer to +5v instead of 5.3v or so.

As of yesterday I've got NetBSD 10.1 booting on the 68060. It's much quicker! There is a critical bug somewhere in the SONIC code that remains to be addressed - it also breaks on 040s with cache - but otherwise it seems to be working well. The current NetBSD kernels also have a bug in the ADB code for the VIA-ADB hardware, it doesn't seem to become a problem until you're on a Quadra 800 (33mhz bus + 60ns RAM timings), I identified a fix for that too. I'll have to figure out how to commit code to NetBSD when all settles out.

Some benchmarks from netBSD - compiling nbench. The 060 is running at twice the bus speed listed (mhz) due to clock doubling.

1775837271609.png

As far as nbench is concerned, this 80mhz 68060 is turning in 1.5x the performance of a pentium 90 in purely integer workloads. The FPU is, of course, another story, and when memory performance is more in focus (sorts?) the 060 falls behind.

The Pentium is on a 60mhz 64 bit bus as compared to a 40mhz 32 bit bus which is more 486-like, and the comparison extends to video: the quadra framebuffer is essentially a 486 Vesa local bus video card without acceleration. This is not ideal as far as the 060 is concerned, but it's the closest thing to a period 060 design without design major compromises. In another timeline this could have been shipped as a viable machine.

Here are some more Mac benchmarks. This was mostly exploring the benefit of 128K cache vs 256k cache vs 256k 2-way-associative cache.
1775837337397.png

This cache board was originally designed as a 256K direct mapped cache as an upgrade from the usual 128K cache boards. However, the changes required to support the additional capacity made it a relatively simple bit of bodgery to change it into a two way associative cache using a random replacement policy. This makes it somewhat more resistant to cache thrashing and I couldn't resist digging into how that changed performance. It seems like the 040 is fairly happy having any cache and the additional capacity didn't help all that much, but the more bandwidth constrained 060 benefited more.

Note: the 33mhz/60ns timings in these screenshots refer to the Quadra 800 33mhz 60ns RAM timings, these are notably more aggressive than the 80ns timings used on the Centris/Quadra 650 and provide greater performance. They are a minor overclock when running at 40mhz bus speed. I'm running 128MB of 40ns RAM, so I've got a little room to push it :)
 

Attachments

Back
Top