Phoenix: Open-Source NuBus FPGA Accelerator for 68040 Macs


Hey all,
I've been designing an open-source NuBus FPGA accelerator card targeting the Quadra 700 (and other 68040 NuBus machines). The idea is to give the 68040 hardware acceleration for things it was never designed to do — TLS 1.3 cryptography, hardware blitting, basic video decode, DSP, and general-purpose compute — without replacing anything about Classic Mac OS.
I have a KiCad schematic, a design document, and a BOM ready for review. Looking for feedback before I move toward PCB layout.
GitHub repo: https://github.com/sailcat/phoenix-nubus




What's on the card


  • FPGA: Lattice ECP5 LFE5U-85F (BGA-381) — chose this specifically for the open-source toolchain (Yosys + nextpnr + Project Trellis). 84k LUTs, which gives room for all target cores simultaneously with about 25% headroom.
  • SRAM: 2x IS61WV51216EBLL — 1MB each, 10ns async. One dedicated to crypto/watchdog buffers, one to blitter/DSP scratch.
  • SDRAM: IS42S16320F — 64MB, 166MHz capable. Frame buffer, video decode reference frames, bulk storage.
  • Flash: W25Q128JVSIQ — 16MB SPI. Stores multiple bitstream images (switchable via DIP).
  • Level shifters: 3x SN74LVC16T245 — bidirectional 5V/3.3V translation for the full NuBus bus (AD[31:0] + control signals).
  • Power: 3x TLV62568 buck converters with TPS3700 supervisor for sequenced power-up (1.1V core → 2.5V aux → 3.3V I/O, per ECP5 requirements).
  • Clock: 50MHz MEMS oscillator. FPGA PLLs derive everything else internally. 10MHz NuBus clock comes in through a level shifter.
Estimated BOM is $60-90 at qty 1. Total power draw around 2-3W, well within the NuBus per-slot budget.

Target accelerator cores



CoreEst. LUTsWhat it does
Crypto Engine~12kAES-256, ChaCha20, SHA-256, Curve25519 — enough for TLS 1.3
Graphics Blitter~8kHardware blit, scale, rotate, alpha blend
Video Decode~15kMotion compensation, color space conversion, targeting 15-20fps @ 320x240
DSP~10k8-channel mixer, sample rate conversion, wavetable synthesis
Compute Unit~8kVector MAC, matrix multiply
Watchdog~2kBus monitor + DMA for memory protection via 68040 MMU

Total is ~63k LUTs out of 84k available.

How it talks to the Mac

The card sits in a standard NuBus slot and maps its registers into the assigned 256MB address window. A system extension (INIT) detects the card via Slot Manager, installs the interrupt handler, and exposes a shared library (PhoenixLib) with C-callable APIs for each accelerator core. Software talks to the card through memory-mapped register writes — nothing exotic.
The card also supports DMA bus mastering for bulk transfers (frame buffer writes, watchdog shadow copies) and generates interrupts via /NMRQ for completion notifications.

The companion bridge concept

The card handles acceleration. A Raspberry Pi on the local network handles the internet-facing stuff — TLS termination, HTTP content simplification (strip tracking/JS/autoplay), protocol translation, media transcoding. The Pi prepares data, ships it to the card over Ethernet, and the card's crypto/video/DSP engines do the heavy lifting. The Mac never touches the raw modern internet directly.

Current state

The schematic is architecturally complete — all components selected, all inter-sheet connections defined, design constraints documented. The full pin-level wiring (especially the BGA-381 fanout) needs to be finished in KiCad before layout. The design document in the repo covers component rationale, NuBus register map, PCB stackup notes, FPGA fabric allocation, and a preliminary C API.
No HDL written yet. No PCB layout started.

What I need from you

Specific questions:
  1. Quadra 700 NuBus cage clearance — Does anyone have physical measurements of the vertical clearance above a NuBus card in the Q700? I need to confirm component height constraints. The board is designed as a short Eurocard (100mm height) but I need to know if tall components on the top side are going to be a problem.
  2. NuBus Declaration ROM — Has anyone here written an sResource directory from scratch? I've read Designing Cards and Drivers for the Macintosh Family but practical experience with the format would be incredibly helpful. If anyone has disassembled the ROM from an existing NuBus card and has notes, I'd love to see them.
  3. ECP5 vs Artix-7 — I went with the ECP5 for open-toolchain reasons, but the Artix-7 (XC7A100T) has more fabric and better DSP blocks. Anyone have strong opinions here? The ECP5 is well-proven in the open-source FPGA community (ULX3S, etc.) but I'm open to arguments.
  4. Level shifting approach — Is three 74LVC16T245s the right call for the NuBus interface, or is there a better approach people have used? The bidirectional direction control adds a bit of complexity since the AD bus is multiplexed. Particularly interested in how to handle /NMRQ (open-drain on NuBus).
  5. PCB layout help — I'm looking for someone experienced with BGA fanout on 4-layer boards. The ECP5 BGA-381 at 0.8mm pitch is the hard part. This would be a paid gig, not asking for free labor. If anyone does this kind of work or can recommend someone, please reach out.
  6. NuBus connector sourcing — Where are people getting DIN 41612 Type C connectors for new card designs these days? Any preferred suppliers?

Everything is on GitHub, open source (leaning CERN-OHL-S for hardware, MIT for software). Happy to answer questions about any part of the design.

Edited to Add:

I want to call out @Melkhior's NuBusFPGA project — which I discovered via the similar threads prompt right after posting this. That project has already solved several of the problems I'm facing, particularly the Declaration ROM development workflow (QEMU digital twin approach), NuBus bus interface design, and the LiteX/Wishbone bus fabric for connecting multiple devices inside the FPGA. Phoenix is targeting a different use case (accelerator cores rather than video output), but the NuBus interface engineering and driver architecture are directly relevant, and I expect to learn a lot from that codebase. If you're interested in NuBus FPGA work generally, that thread is essential reading.


 
Last edited:
The design is mine, the KiCad files were generated with AI assistance. I'm stronger on the architecture and systems design side than I am on KiCad — so I used Claude to get the schematic into files from my design spec. The component selections, architecture, and design constraints are all my own work. The pin-level wiring still needs to be finished by hand in KiCad before layout, which I noted in the post. If that's a dealbreaker for anyone here I understand, but the engineering is real and I'm here to get it right.
 
Not a problem with claude per se, more of a problem with the files containing a bunch of invalid syntax that doesn't open in either of the kicad installs I have here (and being still broken after me manually patching them a bunch) :)

To be honest, there's not a lot to review here, and a lot more to do before this is close enough to go on a PCB. You ideally have to make the pinmap, HDL, and layout walk together. I've not written a single line of VHDL in close to 10 years, so I won't be much help with that, but you really want that part to be better-defined before presenting it, or worrying about other details of the physical implementation... which are mostly trivial (clearances are well-documented I think, even for the Q700, modern '245s are fast enough to be used as nubus buffers, open drain is open drain and !NMRQ is slave-to-master only so not a lot to worry about, DIN41612 connectors are still made and widely available)
 
Last edited:
You found the NubusFPGA, but some answers that may not be in the repo:

  1. ECP5 vs Artix-7 — I went with the ECP5 for open-toolchain reasons, but the Artix-7 (XC7A100T) has more fabric and better DSP blocks. Anyone have strong opinions here? The ECP5 is well-proven in the open-source FPGA community (ULX3S, etc.) but I'm open to arguments.
Unless you need a specific feature of the FPGA (like 7-series' TMDS signalling for HDMI), I'd say
* Toolchain you're accustomed to for design, synthesis & simulation
* Package complexity/number of pins; 1.0mm pitch BGA are easier to deal with than 0.8mm, QFP might be an option for some FPGAs but probably not at the size you want (see DoubleVision). The power supply/decoupling might be tricky as well.
* Spartan-7 might be a better alternative to Artix-7 if you don't care for the high-speed I/Os only available in Artix-7
* No idea for ECP, but Spartan/Artix can handle DDR2/3 memory

  1. Level shifting approach — Is three 74LVC16T245s the right call for the NuBus interface, or is there a better approach people have used? The bidirectional direction control adds a bit of complexity since the AD bus is multiplexed. Particularly interested in how to handle /NMRQ
CB3T family is expensive and not true shifter, but they are bidirectional and really, really fast. For NuBus they are probably overkill (didn't stop me) as the delay from a normal shifter is unlikely to be an issue, and would likely be cheaper.

  1. NuBus connector sourcing — Where are people getting DIN 41612 Type C connectors for new card designs these days? Any preferred suppliers?
Those DIN connectors, in particular Nubus' 3x32 pins version, are still very available (the 3x40 of the '030 PDS less so, the unrelated 3M PAK50 from the '040 PDS is unobtainium unfortunately). Pick the cheapest you can get.
 
Interesting idea!

You mention some music/audio related functionality. I would encourage you to look at Digidesign's Sound Accelerator/Audiomedia and SampleCell cards.

The Sound Accelerator and Audiomedia cards were basically a Motorola 56k DSP chip with its "host port" connected via Nubus and audio input/output hardware. Early versions of the ProTools DAW, the Sound Designer audio editor, and the Turbosynth synthesizer were partially written as 56k DSP programmes that were uploaded from the host to the card. There is some documentation available, and I also have archived a DIY programme somebody made to loader 56k DSP programmes to an Audiomedia card, as well as the toolchain.
The SampleCell hardware was basically a hardware sampler in an ASIC controlled by a very decent software editor running on the Mac. The ASIC, of course, isn't documented, nor is the protocol used to communicate. But with an FPGA running on the Nubus I think it'd be possible to reverse engineer it fairly quickly.

It might be a different sort of project than what you're interested in, but I think the thing to consider here is software. There's a *lot* of person-hours in this software, which was made for professional use and honestly still holds up. Making new hardware to talk to existing high-quality software is something to consider, I think.
 
I wonder if this can be made to work over the version of LC-PDS that the LC III and later have
Physically the pinout is different, but logically it's equivalent to the IIsi/SE/30 (except it's 25/33 MHz in the LCIII/III+instead of 20/16). So a different board is required, but the gateware will need a new pinlist at most and the software can be the same.

EDIT: oups, I was thinking of the PDS. This is meant to be a NuBus device, so no; it will require a completely different bus interface to talk to any PDS slot. Doable, but not as easy as adapting from one of the '030 PDS to another.
 
Last edited:
Physically the pinout is different, but logically it's equivalent to the IIsi/SE/30 (except it's 25/33 MHz in the LCIII/III+instead of 20/16). So a different board is required, but the gateware will need a new pinlist at most and the software can be the same.
and higher in the 68040 LCs and PowerPC 5xxx/6xxx Performas
 
Back
Top