• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

Serious proposal: accelerator and peripheral expansion system

cb88

Well-known member
I'm curious about the (assumed) need for meandering traces on paleolithic hardware. At what clock rate would a trace length differential of about a meter actually become problematic
such things become an issue above about 20Mhz, 50Mhz was refered to as black magic stuff back in the days of SparcStations... (thats the max speed mbus will run at it maxes out at 40Mhz on the SS20 for instance as it isn't designed well enough to be reliable above that but the Hyperstation 30 could beat that by 10Mhz). Another good example is PCI, which maxes out at 33Mhz at 5v PCI 2.0 doubled that rate to 66Mhz by cutting the bus voltage to 3.3V.

Serial links tend to run faster these days because you can do things like run differental pairs etc.. and you have less concern about cross talk with many side by side bus wires.

You can run buses faster at lower voltages because there is less noise... but since you inherently have to interface with the system buss of the SE etc.. you have to take such precautions.

 
Last edited by a moderator:

ZaneKaminski

Well-known member
Personally I think emulating the cpu on an ARM is self defeating, and probably not practical.... if you are going to do that you may as well run an emulator on a PC.
I struggle with the feeling of inauthenticity it gives me, but the way I see it is that whatever processor I use is going to run at a clock at least 3x higher than any of these FPGA cores and dispatch as many as two instructions per cycle. So there is a penalty to be paid in terms of the indirection that must be done to perform the emulation, but I'm thinking it'll be beneficial over synthesizing something in an FPGA.
Emulation is not that hard. Executing an instruction word is not that hard. Decode each word into an 8-bit instruction class and then 24 more bits of parameters. Based on the class, go into a jump table, leading you to a routine that executes an instruction of that class. The classes would basically cover the different types of instructions and addressing modes. The decoded instruction can be saved too, greatly accelerating the process.

Other options might be a Zilinx Zynq ... those are around $60 but smash a speedy FPGA + decent ARM chip together. So you could do something like use a VHDL 68k core while using the ARM for accelerating other things like SSL, or providing configuration for the framebuffer output etc... or handling USB peripherals for expansion like wifi etc... Acutally I think that would be a quite workable solution.
I do find that idea appealing, and it was the first solution I approached (though with a similar Altera Cyclone V part), but it's too costly. The FPGA is at least $50, then it needs DDR2/3 memory, flash memory, maybe a separate FPGA configuration ROM chip, may have many power rails and so requires complicated sequencing and multiple regulators, etc. And they're all in these deeeense BGA packages. So that was a dead-end in terms of cost and PCB routing effort.
 

ZaneKaminski

Well-known member
The biggest advantage of the FPGA+ARM SoC chips is the speed of the interconnect between the FPGA and processor. Anything comparably fast with a separate FPGA and ARM SoC would require a lot of routing effort to get anywhere near as fast of an interface.

I think I'm gonna retain the bus control FPGA I've recently decided on, the Lattice MachXO 1200 (in TQFP-144). It's only $6 and using an FPGA will allow us to queue sequential writes to the Macintosh's video memory, for example. Plus it gives us much more flexibility in terms of acting like a memory-mapped "card" to the unaccelerated 68000 on the Macintosh.

It may also be possible to use the Lattice iCE40 FPGAs, and they're cheaper and more capacious. Downside is they're slower (in terms of pin-to-pin latency) and require external configuration. The external configuration was initially something I wanted to avoid, but maybe it would be better than ensuring something in the system can bit-bang JTAG, which is clumsy.

 
Last edited by a moderator:

Bunsen

Admin-Witchfinder-General
I think I'm gonna retain the bus control FPGA I've recently decided on, the Lattice MachXO 1200 (in TQFP-144). It's only $6 and using an FPGA will allow us to queue sequential writes to the Macintosh's video memory, for example. Plus it gives us much more flexibility in terms of acting like a memory-mapped "card" to the unaccelerated 68000 on the Macintosh.
I have a hunch that even this part of your project by itself will prove useful for people wanting to develop other kinds of expansion cards.

 

Bunsen

Admin-Witchfinder-General
for 68000 systems. On these, I am eyeing the (not yet in production) STMicroelectronics STM32H7
Is there a pin-compatible part in production now that you could use for prototyping?

The FPGA shouldn't be necessary unless the microcontroller doesn't have enough pins to talk on the bus.  I have to study the manuals for the STM32F4, -F7, and -H7 and also bbraun's work more thoroughly.
That's encouraging.

Pardon me for not digging up the exact link, but I gathered from bbraun's work that this series of micros have a PRU-like semi-independent fast IO subsystem that can bit-bang at ~100MHz or better.  Even if that needs downstream muxing because of pin shortage, it sounds like it will be fast enough.

It could have USB mass storage support, with a little FAT32 and USB work. This was what I was hoping to avoid by using Linux, but it shouldn't be too bad. Same for USB keyboards and mice. I'm sure it couldn't be too hard.
There must be some existing code out there that could be adapted, yeah?

Maybe I will also drop the onboard SD card in favor of a bit of fixed flash memory. The SD slot is frustrating since it sort of requires a $1 electrostatic discharge protection IC.
IMO, if you have USB storage sorted out, both SD and onboard flash would be redundant.  At worst they could be designed in and left unpopulated.

One other question - does the STM32H7 have Ethernet?

It occurs to me that there are existing solutions [1] for adding Linuxy-functionality to oldMacs, and if someone wants to, they could throw a cheap Pi clone into the mix themselves, with a fast link to your accelerator.

----

[1] Networking a classic Mac via serial port -> OS X /unix /Linux -> internet

    MacIPpi - Surf the Internet on your old Macintosh with TCP/IP over LocalTalk

 
Last edited by a moderator:

Bunsen

Admin-Witchfinder-General
FPGAs / require external configuration. /  something I wanted to avoid
So, call me crazy, but would the Mac itself be up to the task of uploading config data, via an INIT, CDEV or application?  bigmessofwires has something similar working on one of his replacement ROM products.

 
Last edited by a moderator:

Bunsen

Admin-Witchfinder-General
The STM32H7 also has an LCD controller, so, if there are enough pins available, I can break out LCDD, HSYNC, and VSYNC so someone can make a VGA interface or Mac SE grayscale upgrade or something.
Possibly the most flexible thing to do would be to bring out all unused / available pins to a standard header (SO-DIMM?) with a documented pinout, and let future devs run wild with whatever they can use there.

Or all the STMs pins? 

Macintosh SEs all have the 68000 in DIP-64, so I could try and do a board that works on both Plus and SE, leaving the SE's expansion slot open
Excellent thought :)

 

Trash80toHP_Mini

NIGHT STALKER
Going by what I see in my head involving the either/or connector capable MicroMac card, that probably won't work out due to clearances. :-/

 

Trash80toHP_Mini

NIGHT STALKER
Found this pic rummaging for another. The Gemini accelerator board here has both the DIP-64 and PDS connectors installed. I don't see how it would be possible to have a Killy Klip card and a PDS card installed at the same time in an SE..

Gemini_Board-Solder_Side.jpg

 

ZaneKaminski

Well-known member
Is there a pin-compatible part in production now that you could use for prototyping?
Yep, in the STM32F4 and STM32F7 series. I'm reading the manual of the STM32F7 right now.

That's encouraging. [not needing an FPGA]

Pardon me for not digging up the exact link, but I gathered from bbraun's work that this series of micros have a PRU-like semi-independent fast IO subsystem that can bit-bang at ~100MHz or better.  Even if that needs downstream muxing because of pin shortage, it sounds like it will be fast enough.
Not a separate processor like the PRU-ICSS in the TI chips, just fast single-ended I/Os.

I am more comfortable with the FPGA approach though, since it will buy us a bit more speed and flexibility at relatively low cost, which is kind of necessary. This accelerator won't be blazingly fast, certainly faster than an SE/30, but probably not as fast as a Quadra 700. But the cost will be closer to $100, not $160+. My design went the wrong way for a while because I was trying to design an architecture that would work on 68000 but be adaptable to 68020+. In restricting it to just 68000 (at least for now), I can go back on some decisions I made when trying to support 68020+ systems.

For example, these STM32 microcontrollers have a Quad-SPI interface, which I originally rejected as too slow for 68030 systems. The qSPI in the STM32F7 goes at 100 MHz for SDR or 80 MHz for DDR, and you can combine two together for a sort of Dual-Quad-SPI setup (different from Octo-SPI, which is kinda rare). This interface is plenty fast to transfer commands and stuff to the bus interface FPGA for a 7.8336 MHz 68000 system. So in reducing the pin count of the processor-FPGA interface from some 32 pins or something I had before, to the six pins of qSPI (4 data, clock, chip select), I can get a cheaper FPGA or some 4x more capacity for the same price.

There must be some existing code out there that could be adapted [for USB peripherals], yeah?
Probably, but I have heard a lot of bad things about many vendors' USB stacks, so we shall see. FAT32 is easy in comparison. There are a lot of FAT32 libraries around.


IMO, if you have USB storage sorted out, both SD and onboard flash would be redundant.  At worst they could be designed in and left unpopulated.
SD is pointless, but there maybe should be the capability for more than the 2 Mbytes of onboard flash on the STM32H7. Might be useful to store a ROM image or system disk or something on the system. But yeah, the footprint can just be left unpopulated.


does the STM32H7 have Ethernet?
It does have part of what's required for an Ethernet interface, but it doesn't have the physical interface integrated. You have to send some digital signals that come from the STM32H7 to an external "PHY" IC. I would say that the main loss in this new design that users would care about is networking. There are USB WiFi dongles but I doubt anyone will write a driver for one of those.







So, call me crazy, but would the Mac itself be up to the task of uploading config data, via an INIT, CDEV or application?  bigmessofwires has something similar working on one of his replacement ROM products.
That's a funny idea lol, but the problem is that some microcontroller or something actually has to connect some GPIOs to the JTAG header pins, and probably shouldn't itself be be on the JTAG chain.

The latest design I'm imagining uses a Lattice iCE40 FPGA ($5 for "1k" capacity, comparable to the MachXO1200, $6 for 3x the capacity), which doesn't have JTAG, but instead has an SPI programming interface. That's much easier to talk to from the STM32H7, so problem solved. The iCE40 also is more accommodating of DDR signals than the MachXO, so that's nice. Actually, the iCE40 feels much more high-end than the MachXO, other than that it's a tad slower than the MachXO in its highest speed grade. iCE40 has no speed grades.

Possibly the most flexible thing to do would be to bring out all unused / available pins to a standard header (SO-DIMM?) with a documented pinout, and let future devs run wild with whatever they can use there.

Or all the STMs pins?
I'll make sure to break out the cool interfaces. There are several sizes of the STM32H7. BGAs aside, there are LQFP-144, LQFP-176, and LQFP-208 packages available.

The LQFP-144 ones don't support 32-bit SDRAM, which I wanted to use, so they're out. Also, about RAM, I have decided to use only one chip, not two. The routing effort for two SDRAM chips is much greater than for a single chip in the so-called "point-to-point topology." 

So that leaves LQFP-176 or LQFP-208. I'm gonna go over the relevant signals soon and see which one is best. Ideally the LQFP-176 would be sufficient. So I dunno if there would be that many extra pins. When there are only two signal layers (on a four-layer board), and you have splits in the power plane, etc., that makes it hard to bring all of the signals somewhere, especially when the SDRAM and QSPI signals should have their impedances controlled, etc.

Going by what I see in my head involving the either/or connector capable MicroMac card, that probably won't work out due to clearances. :-/
The issue is that on the Plus, there isn't that much room between the 68000 and the power/video connector.

But also, the problem is that SE users will want to mount theirs with the PDS, not on the 68000, even with the Killy Klip. And it's not like you can mount a horizontal SE-style card in the PDS with the accelerator in there.

 
Last edited by a moderator:

ZaneKaminski

Well-known member
I say, for Ethernet (and WiFi), with this design, leave internet connectivity out. The way I see the peripheral expansion is that breaking out more interfaces is hard, so I wanna break out just USB 2.0 HS, the LCD interface (basically digital parallel VGA), and then a UART or two for debugging or low-cost peripheral expansion.

A cheap WiFi connector could just connect an ESP8266 WiFi module to the UART haha, but a more elaborate and full-featured solution for Macintosh SE would not be too difficult or expensive. USB hub chips are only a few bucks, the vertically-oriented USB Type-A connectors are inexpensive as well, then throw in the ESP8266 and a cheap microcontroller to connect it all. Could have a cheap resistor DAC to make VGA too. The STM32F429 is only $9 and has USB 2.0 high-speed, an additional LCD controller, and enough internal memory for a 4-bit 512x342 framebuffer. Putting all that on a board, along with a yoke board that'll do grayscale, would be a compelling and relatively inexpensive ($60?) upgrade for owners of the accelerator. On the other hand, it can only be that cheap if someone puts in all of the software and design work for free, and I dunno if I'll make it. But it should be straightforward for anyone experienced with putting a microcontroller and some other stuff on a board.

 
Last edited by a moderator:

Trash80toHP_Mini

NIGHT STALKER
The issue is that on the Plus, there isn't that much room between the 68000 and the power/video connector.

But also, the problem is that SE users will want to mount theirs with the PDS, not on the 68000, even with the Killy Klip. And it's not like you can mount a horizontal SE-style card in the PDS with the accelerator in there.
Yep, clearance conflict between Apple's horizontal PDS card spec. and a Killy Klip card was exactly what I tried to explain. Not so good with words, I figured the pic would let others visualize the problem.

As for a NIC solution, you're already putting USB on the card. IIRC, I've got an inexpensive USB WiFi dongle the size of a wireless mouse nubbin knocking around the joint somewhere. Breaking USB out to a pair of connectors (one for the nubbin) on the expansion card coverplate might be an elegant solution for users to add networking to your accelerator.

 

ZaneKaminski

Well-known member
The little WiFi dongle approach seems easy, but it's only easy when we run Linux, since the dongle probably has a Linux driver. But for my latest design, which is not supposed to run Linux, it's going to be too hard to develop a driver for the dongle. So we need something easier to work with, like the ESP8266 module.

 
Last edited by a moderator:

Trash80toHP_Mini

NIGHT STALKER
Ah! Like I said, this stuff is way over my pay grade to really be of help. Notions pop up in the noggin' and I just bounce 'em off you as I think of them.

Another silly question: would it be any easier if you could hand off the "slow" I/O bus to a co-processor in the manner Apple used a pair of 6502s in the IIfx for that task?

I'm thinking along the lines of getting the rPi crowd interested in taking on that end of the project to make entry level pricing for your basic Accelerator as inexpensive as possible? As I understand it, that gang has TONs of hardware and software hackage available for uses of all sorts.

edit: the thought here would be to allow you to grip the KISS principal tenaciously while opening the project up to all manner of feature creepage.

 
Last edited by a moderator:

techknight

Well-known member
if your FPGAs require external configuration, You can probably load those up using the external ARM chip. 

Just hold the RESET line low, to keep the machine stuck in reset while this task completes. Once the ARM loads the FPGAs, and the ARM is "booted" and ready to go, release the RESET line and let everything take off as it should. 

I would still love to see some sort of webkit integration/acceleration to make web browsing actually "usable" on the SE, and SE/30, that would really make my day. otherwise just having "faster" speed for existing software to run on the speed it comes stock with isn't useful to me, UNLESS its emulating a 68020+ and can handle CFM68K. Then it becomes useful for app development for some of the things that I really want to do. 

 
Last edited by a moderator:

ZaneKaminski

Well-known member
if your FPGAs require external configuration, You can probably load those up using the external ARM chip. 

Just hold the RESET line low, to keep the machine stuck in reset while this task completes. Once the ARM loads the FPGAs, and the ARM is "booted" and ready to go, release the RESET line and let everything take off as it should. 
Yeah, that's easy with /RESET. In the old design, I had the System Controller in charge of pulling reset low, and it would time the boot sequence between the FPGAs and Snapdragon. Actually, it turns out that the Lattice iCE40 series of FPGAs does have internal configuration flash, but is more accommodating to external programming than the MachXO.

Only CPLDs are truly instant-on. FPGAs that don't require external configuration memory just have some internal flash and they usually take a few hundred microseconds to a few milliseconds to load that into their SRAM.

I would still love to see some sort of webkit integration/acceleration to make web browsing actually "usable" on the SE, and SE/30, that would really make my day. otherwise just having "faster" speed for existing software to run on the speed it comes stock with isn't useful to me, UNLESS its emulating a 68020+ and can handle CFM68K. Then it becomes useful for app development for some of the things that I really want to do.
I really want WebKit too, but I'm afraid that'll have to be on a higher-end product, not a 68000 accelerator.

This setup isn't really capable of accelerating 68020+ systems. They have a larger address space, 68030 has internal MMU, more bus throughput, and users of these systems expect other features like greater color depth, etc. Trying to do all of that on this microcontroller would be too hard. As it is, the performance of a Macintosh SE with the accelerator will be faster than an SE/30, hopefully faster than a IIci, but probably not as fast as a system with a 40 MHz 68040.

An accelerator for a 68020+ system would have to have a microprocessor, DDR2/3 memory, etc. A system of that class has enough horsepower to run WebKit. This little STM32H7, without a real OS? I don't know, maybe there's some browser that will run on such a system, but I don't think it's WebKit.

I was trying to do this unified, overdesigned system giving certain (fancy) capabilities to any 680x0 Macintosh, but that just drives the price up for 68000 systems.

So in this latest iteration I'm working on, the 68000 systems are intentionally slower than any 68020+ design that might follow, which should use the Snapdragon 410 module, etc. 

However, you can figure out how to send over USB entire 512x342x8bit frames, maybe some keyboard and mouse data too, and then stick some other system running WebKit onto the USB bus to get online. It just can't be integrated for 68000 systems. Too expensive.

 
Last edited by a moderator:

Scott Squires

Well-known member
I like the STM32H7 direction a lot better than the snapdragon direction. I agree with the comment about if the computer is running linux, it's hardly different than just running an emulator.

 

ZaneKaminski

Well-known member
Well what I meant was on a base 68000 system, the accelerator emulating a 68020+ on a 68000.
Nooo, unfortunately I never planned for that. Emulating a different CPU complicates things. I planned to use the ROM from the machine as-is, and the software in the ROM only works under the assumption of the presence of some particular chipset.
68030 systems all have, for example, the Apple Sound Chip, which is absent from 68000 systems. So that would need to be emulated, and accurate peripheral emulation is a project in and of itself. My aim is to emulate the exact same processor as comes in the machine, but seemingly executing many more instructions per clock.

What machine in particular did you want these more advanced features (web browser and 68020+) for?

For Plus and SE, if you want 68030, the solution is to upgrade to SE/30 and get the accelerator for that. For your Portable... there is no solution in the exact same form-factor. PB140/170 is the closest.

 
Last edited by a moderator:

ZaneKaminski

Well-known member
Honestly, a PB170 accelerator sounds so cool to me. Active-matrix screen, and the slim (at least in comparison to the Portable) first-gen PB design

I can understand if you have a soft spot for the Portable, though.

 
Last edited by a moderator:
Top