• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

68040 fpga replacement

Hello macintosh friends. I have been searching all over for a way to replace a 68040 CPU with a more modern alternative, something along the lines of a FPGA to be used on an accelerator card or even as a whole logic board replacement. There are a few projects out there but most of them stop at 68030 or have just managed to reproduce a 68000 CPU.

I had come across an FPGA core from apollo core who managed to extend the 68040 into what they described as the 68080 however that was a dead end unfortunately. Despite having searched and saw they were advertising a free developers version but turns out that version had strings attached. Let me know what you all think...
 
Last edited:

Byrd

Well-known member
Hi pinwheel_of_doom,

I assume you're talking about the yet to be seen working 68000 Buffee core and the very much in development PiStorm? I haven't heard of any drop in FPGA replacement for 68040 yet, but here's hoping. It's a fascinating time though, in the Mac world we are seeing complete motherboard replacements but no cheap and cheerful CPU accelerators - yet.
 
Hi @Byrd, not so much projects involving raspberry pi emulating a 68000 CPU but actual vhdl or verilog code creating a 68040 cpu. I've come across a few projects, one called fx68k but that looks like only a 68000 cpu. There is a project that has a 68040 written in C, maybe that's a good place to start.
 

Franklinstein

Well-known member
The problem with the '040 is that its bus protocol was quite different from the preceding 68k models, requiring the use of GALs and other chips to interface it to the older designs. Motorola even developed a chip that was specifically designed to interface the '040 to an older 68k bus design (and I'm pretty sure Apple used the core of this to build the Primetime and other chips used in many '040 Macs). Why? Well the '040 was a huge chip already owing to its execution core and there wasn't enough chip real estate remaining to shoehorn in backwards compatibility (or even a fully 68882-compatible FPU; transcendental functions are emulated). Also likely is that they were going for maximum performance and jettisoning back compatibility and infrequently-used instructions was the easiest way to do this.
But the '040 was a huge and complex chip. I don't think an FPGA would be up to the task of fully emulating one at a reasonable speed (at least, not fast enough to make a difference over a normal '040 to be worth the expense and effort). What I picture here instead is an FPGA that interfaces to a specific bus ('030 or '040 depending on which slot or socket you're trying to fit it to) and dynamically translates the 68k instructions into whatever ISA the new processor uses (ARM, PPC, ColdFire, SuperH, MIPS, etc), exactly like the Mac's 68k emulator works in software or modern X86 processors work internally. Theoretically it could be a fairly quick accelerator since the translation is all done in hardware and the power of a full modern CPU is available to do the actual heavy lifting.
 

Gorgonops

Moderator
Staff member
The “Apollo Core” used by the FPGA-based “Vampire” Accelerators for Amiga computers claims to perform at around the equivalent of a 500 MHz 68040, and about the only feature I believe it’s lacking is a full MMU. The 68040 was a big complex chip in 1991 but 30 years of silicon advancement is a long time.

That said, as noted Apollo is neither open source nor have its creators done a 68040 compatible bus interface for it. (They just cracked fitting it into a 68030 based machine.) The Vampire also gets a lot of its speed from running as much as possible from dedicated RAM connected directly to it so even though it’s an “accelerator” connected through the CPU bus it’s also effectively a partial system board replacement. To really go a lot faster than the original CPU this is probably going to be necessary on any design; at the very least you’ll need a very clever caching algorithm.

The blunt truth is if a full main board replacement is an option there’s not a ton of justification for picking an FPGA over emulation. Modern mainstream CPUs can emulate a 68040 faster than a Vampire for a lot cheaper.
 

Gorgonops

Moderator
Staff member
… re: that suggestion of somehow doing instruction translation with an FPGA but then having some other CPU behind there to execute the resulting “micro-ops” that doesn’t sound to me like it’d fly at all. (Or at least be worth it.) The 680x0 is a CISC CPU with an extremely complicated ISA (maybe not as “bad” as x86, but still very complex) with tons of addressing modes, multi-length instructions, etc. Instruction decoding is the *hard part*. If you manage to do that competently within an FPGA there are plenty of very fast and clean open-source RISC cpu softcores that you might as well just toss in there and cut out the hassle of interfacing another CPU out the back door. (This will also let you specifically optimize the decoder and the micro-ops executor to work well with each other; I’m vaguely sure this is effectively what the Apollo guys have done.)

And of course if you’re just using an FPGA as glue and pitching the instruction decoding to the new CPU (which you might as well, it’ll let you leverage existing JIT emulator code bases) then you’re doing the Buffee/PiStorm strategy. (CPLDs in those, I don’t know if there’s a CPLD big enough to do the bus glue for a 68040, but it’s the same end result.) I personally have “opinions” about the Buffee and the PiStorm that I’ll keep to myself; if what they give you is what you want then have at it.
 

Franklinstein

Well-known member
Sounds like we're kind of on the same page. What a number of these advanced FPGAs (basically an SOC FPGA) would offer is kind of what was discussed: you'd interface to the '030 or '040 bus interface and do an on-the-fly 68k-to-[new arch] translation but instead of sending the code to an external CPU you're executing it internally on whatever ISA the FPGA offers. NXP (or someone else) should offer some FPGAs with PPC cores that can do that. Potentially you could copy Apple or Connectix's SpeedDoubler 68k-to-PPC code library and implement it directly in hardware, or I guess just load it into the CPU cache and running it from there like the original emulator does. I mean Apple's official documentation (and real-world tests) show that the 6100/60 can execute most 68k code at least as fast as a 20MHz '040, so a well-implemented modern FPGA solution should be able to noticeably improve on that
 

Gorgonops

Moderator
Staff member
Sounds like we're kind of on the same page. What a number of these advanced FPGAs (basically an SOC FPGA) would offer is kind of what was discussed: you'd interface to the '030 or '040 bus interface and do an on-the-fly 68k-to-[new arch] translation but instead of sending the code to an external CPU you're executing it internally on whatever ISA the FPGA offers.

No, not really. An FPGA that has a PowerPC core wedged into it (which, FWIW, was a thing ten years ago but it doesn't look to me like they sell it anymore) could certainly be used to create essentially the equivalent of an Apple PowerPC upgrade card "easily enough". IE, again, you'd essentially just being doing what a Buffee or PiStorm attempts to do, which is treat the attached host computer as a collection of memory and peripherals that happens to be accessed via the bus protocol of the CPU you replaced. But those cards didn't do any "on the fly 68k-to-[new arch] translation". The emulation was entirely in software, there was no specialized hardware trying to do instruction fetches independently of the new CPU and reorder them into native micro-ops. These cards turned a 680x0 Mac into a PowerPC Mac that happened to have a weird suboptimal architecture, the card didn't "fake" being a 68030 or 040 on anything but an electrical level.

My point wasn't that it was necessarily impossible to do that in the FPGA, but what I was saying is that it would probably turn out to be a huge unnecessary complication after you've gone through all the work to make a reliable 680x0 ISA instruction decoder in hardware, which is what you're asking for here, to then have it have to specifically turn out code for some other existing complex ISA like PowerPC (or even ARM) when a really stripped down and specialized RISC... or actually, VLIW would probably be the way to go, core could backstop the instruction decoder you've spent all that time crafting far more efficiently. (And probably in a lot fewer transistors.)

PowerPC is good at executing PPC bytecode, ARM is good at executing ARM bytecode, if you want this thing to run faster than ARM or PowerPC running a 68000 emulator (and don't forget, with techniques like JiT 68000 emulators can be *very fast*) you're going to have to build a miraculously efficient instruction set translator inside of an HDL like Verilog to spit out perfectly optimized PPC or ARM bytecode that will not only accurately reproduce the same results as the real deal but has to then shove those results back out through whatever intermediary registers are in play to get it back into the pipeline. This isn't a handwave-y "just translate computer code to FPGA" problem; FPGAs are not CPUs.
 

trag

Well-known member
I have felt that building a 68030 or 68040 imitation on FPGA would be fun. It might be pointless in the real world, but so is collecting old computers, mostly.

There may not be a practical advantage over emulation on a modern platform, but darn it, an FPGA is a chip, doing chippy things in logic, not just executing a bunch of sequential instructions from code. :)

Once the "basics" of that are worked out <insert exhausted laugh> refinements like using FPGA-on-board DDR2 controllers for main memory can be added. For the 68040, picking an FPGA with enough Block-RAM elements to emulate the built-in L1 caches would be important, I think. Picking one with enough Block-RAM to also form an L2 cache would be even better.

Of course, since it's all the same Block-RAM, one could just make the L1 caches larger, but that might make the imitation behave enough differently than the original to create plug-in-replacement problems. Probably not, but maybe.
 

trag

Well-known member
Of course, I've never made any progress on such a project other than shopping for an FPGA development system that actually has 100+ I/O pins accessible - which is close to the number needed to provide all the 68030 connections. Actually, IIRC, it needs about 80.

I've never seen such a system. They're all woefully short on accessible I/O pins.

So one would probably have to roll one's own development board, which isn't terrible. One can start right from teh beginning with a PGA header on the board, or use a mini-DIN connector for compatibility with '030 PDS slots.
 
Xilinx software comes with a program to convert C code into VHDL/Verilog. I had read about this before where programmers can use HLS to convert programs into FPGAs for greater performance instead of having the CPU constantly running. As I mentioned before there is a C implementation of a 68040 located here https://github.com/kstenerud/Musashi. I tried running it through the Xilinx HLS program but it errors out, I'm assuming probably because I need to properly initialize the program and call the correct routines for the program to be valid.
 

CC_333

Well-known member
@Gorgonops Once this is all said and done, it seems almost pointless to shove this hypothetical 680x0 imitation CPU thing into an existing Mac, as it will contain most of what makes a Mac a Mac on the FPGA, rendering the "real" Mac little more than a terminal to the "fake" Mac-on-a-Chip (MoC).

Therefore! Why not go the extra mile and simply create a brand new Mac-like logic board from scratch, with this hypothetical CPU thing at its core? Functional clones of the SE, Plus and LC i/LC II now exist, so one of those might be a good place to start creating something that fits an existing form factor.

In other words, it's pointless to replace a Mac's CPU with one of these, as most of its potential will be wasted anyway, beside the fact that, as mentioned above, the Mac it's installed in would cease to be a Mac. It would be functionally equivalent to stuffing a modern Mac Mini or Raspberry Pi into an empty Mac case, but worse because of the boatload of problems of interfacing the slow I/O subsystems on the original logic board to the new, very much faster MoC.

Make sense?

c
 

Gorgonops

Moderator
Staff member
Xilinx software comes with a program to convert C code into VHDL/Verilog. I had read about this before where programmers can use HLS to convert programs into FPGAs for greater performance instead of having the CPU constantly running. As I mentioned before there is a C implementation of a 68040 located here https://github.com/kstenerud/Musashi. I tried running it through the Xilinx HLS program but it errors out, I'm assuming probably because I need to properly initialize the program and call the correct routines for the program to be valid.

Just to be clear, that C to VHDL thing can't magically translate all/arbitrary C code into Verilog. It's designed to allow an FPGA to accelerate software algorithms which would benefit from the sort of "parallel-ality" that it can provide. (Think of like the sort of matrix operations Altivec was designed for but on steroids.) Some pretty strict limitations apply to the code structure, so I'd be *very* surprised if you could just throw any particular MAME processor core at it without significant modifications. (And even when it comes to translatable code said code will have to be structured in such a way to actually leverage the FPGA's strengths; if the compiler can't unroll loops into a widely parallel set of simple ALU-like operations a conventional CPU will probably run it faster.)

It's certainly possible that there exists emulator code out there that would be a natural fit for this. (And for all I know maybe the Vampire/Apollo Core people maintain their "68080" code in the form of C translated to HDL.) But unless it says so on the can it's already set up for it I would expect to have to do a lot of work to port any random "CPU-focused" emulator into an HDL-able product. Whether that would be less work than starting with an existing open-source 68000 FPGA core like AO68000 or FX68k is probably an open question.
 

Franklinstein

Well-known member
@Gorgonops Once this is all said and done, it seems almost pointless to shove this hypothetical 680x0 imitation CPU thing into an existing Mac, as it will contain most of what makes a Mac a Mac on the FPGA, rendering the "real" Mac little more than a terminal to the "fake" Mac-on-a-Chip (MoC).

Therefore! Why not go the extra mile and simply create a brand new Mac-like logic board from scratch, with this hypothetical CPU thing at its core? Functional clones of the SE, Plus and LC i/LC II now exist, so one of those might be a good place to start creating something that fits an existing form factor.

In other words, it's pointless to replace a Mac's CPU with one of these, as most of its potential will be wasted anyway, beside the fact that, as mentioned above, the Mac it's installed in would cease to be a Mac. It would be functionally equivalent to stuffing a modern Mac Mini or Raspberry Pi into an empty Mac case, but worse because of the boatload of problems of interfacing the slow I/O subsystems on the original logic board to the new, very much faster MoC.

Make sense?

c
I completely agree: personally I think going totally bananas with stuff like this is a little pointless, at least from a practicality standpoint. Stuff like what Action Retro does on his YouTube channel is cool and all, but at the same time: why? You can play Minecraft under OS X on a new computer without spending tons of money on rare upgrades for niche computers from nearly 30 years ago.

Of course the most obvious and direct answer is: because it's there. And I respect that, for sure. Props to all you crazy people doing this sort of thing. I'm just probably not going to join you on that. I mean I use vintage computers because I want to use vintage computers: the slow processors, the limited RAM, the clicky HDs, floppy swapping, the weird display resolutions, and trying to eke out as many extra cycles as possible using whatever software and hardware tricks were available in the era. Not because I want to try to drag the early '90s kicking and screaming into the 2020s. If I wanted to do that, emulation is completely adequate and much easier.

Now building a replica Mac in an FPGA SOC for stand-alone purposes (like one of those new NES mini things) I could get behind. Or even building a new Mac-on-a-card and slotting it into one of those NuBus expansion chassis to make a 12-slot Mac. Both of these things could be built off some of the ideas here and explored elsewhere with less fuss than trying to do a modern brain transplant on a Quadra 630 or something. I mean good on you for trying but if we're going the brain transplant route I'll stick with a more period-appropriate mod like an overclock or PPC upgrade or logic board swap.
 
@Franklinstein, that was sort of my motivation for starting this thread in the first place, not only to see if there was a way to reproduce a CPU but also to pick up on a project started by big mess o wires in which had a project started to build a classic 68k mac on an FPGA. The project is located here still for anyone who wants to look at the HDL files.


It got me thinking if there was an existing HDL out there for older CPUs then maybe there's some for the 030 or maybe 040 that could be packed onto an accelerator card. @Bolle makes those amazing carrera040 replicas but you need to source your own CPU, how cool would it be if it came with a selectable FPGA...
 

Gorgonops

Moderator
Staff member
Just as a point to chew over, I would say there's one reason why vintage Macs are probably a less prime target for an FPGA-based upgrade or replica than, say, the Amiga: the Macintosh was never a very "hardware dependent" platform. From the very beginning the Mac was structured around abstracting the underlying hardware through the System ROM, and writing software "to the bare metal" was actively discouraged. (And on the flip side, Macs have never incorporated much in the way of proprietary acceleration hardware into their designs; they in fact tend to be pretty basic, with plain video frame buffers, no blitters or DMA, etc.) The Amiga is the polar opposite of that, or at least the "classic" Amiga that all the old games are written for; most of the power of the machine was in fact built into the chipset. It's pretty difficult and computationally expensive to emulate all that chipset goo and keep everything it does "cycle-accurate" at the same time you're doing the CPU emulation, which is why in fact the most popular Amiga emulator, "UAE", was named with an acronym that originally translated to "Unusable Amiga Emulator".

This dependency on replicating the exact cycle-accurate behavior of the "whole computer" at the same time trying to wedge a faster CPU in there is why crazy CPU upgrades and FPGAs are so attractive to Amiga-ites. (Well, there's a bunch of contributing factors to that crazy, like of course the fact their platform died before it ever officially moved to some other CPU, etc.) But with a Mac running old software on an emulator is already a solved problem, Apple did it in 1994, officially. There's very little Mac software that really depends on cycle-accuracy, so the main selling point of an FPGA is kind of moot.

This issue of cycle-accuracy is actually the source of some really amusing running battles on Amiga forums. Do a little googling and you'll find that there's basically a religious war between the various factions as to how much you can "replace" in an Amiga or how cycle-inaccurate you can be before you've somehow "ruined it". All the true believers of course can't stand pure emulation, at least if it's on x86 computer. (The various PowerPC "Amigas" of course do just as much emulation when running older games, literally running an embedded version of UAE, but the fans of those machines think it's somehow different.) Things like the PiStorm are particularly divisive because while you do get to keep the original Amiga chipset the CPU emulation generally isn't completely cycle-accurate; you'll find YouTube videos demonstrating the PiStorm and showing how although it can benchmark in 50+mhz 68030 territory there are times because of its cycle-inaccuracy it runs games slower than the original 7.16mhz 68000 does... or if not slower, nonetheless runs them "incorrectly". The Vampire itself has its critics because, again, despite being ridiculously fast it's also not necessarily completely 100% accurate when it "needs" to be... etc.

Ultimately I guess when things get completely ridiculous, like people making FPGA replicas of the 6502 that can make an Apple II run at the equivalent of the better part of a gigahertz, I start wondering what the point is other than shock value. (Obviously none of the original software base is really "improved" by this, in fact most of it breaks.) If you could make an FPGA accelerator for 68040 Macs that does as well as the Vampire does (IE, it claims to be about 10 times faster than the fastest real 68040) maybe that's a shade less ridiculous, but... *shrug*. That's not really faster than 68k software already can run on the best PowerMacs under Classic, so I'm curious how large the software base is that really *needs* the accelerator.
 

Unknown_K

Well-known member
To be honest a 68040/40 overclocked to 50mhz with some cache added on is good enough for me.

I like to collect hardware and upgrades for that hardware but I never seen the point of faking a CPU in a system where all the other parts of that system would slow it down badly in real world applications.

I mostly just game on the Amiga and a 68030/50 with added RAM is all I ever seen the need for using WHDLoad for HD installed games. To be honest I still like popping a retail Amiga game box open and running it on a mostly stock A500 since most games are just a few floppies and bootable. People spent tons of money on 060/PPC upgrades to play Quake on an Amiga and I never seen the point since I have old PC hardware for that.

Now there might come to a time where original 020/030/040 processors are dying (which would be long after I am dead since they don't run that hot) where you would need a replacement for a stock CPU. I think that kind of problem will come sooner for more modern CPUs that run hot as hell and have finer lithography where atom migration will be a problem. You also see that in GPUs quite a bit.

I worry more about custom support chips not being available anymore and motherboards rotting from leaky caps and batteries then not having CPUs available.
 
Top