Jump to content

Serious proposal: accelerator and peripheral expansion system


Recommended Posts

Implementing the interpretation algorithm is straightforward enough, I would say. The key is to sensibly decode each instruction, that is to say define good classes of instructions to reimplement as operations against the MC68000 state structure.

 

The decoding scheme has to ensure that there are enough different classes of instruction so that each can be implemented with just one pathway through the implementing code (i.e. the implementation must be implemented with Thumb-2 conditional instructions unless the instruction is an M68k conditional instruction). At the same time, there can't be too many classes of instruction, probably 256 at the most.

 

The same decoding scheme can be used for the translation algorithm, but instead of choosing, based on instruction class, routines implementing the instruction as a transformation of the MC68k state structure, the translation engine would write some ARMv8-A code implemennng the instruction.

Link to post
Share on other sites
  • Replies 203
  • Created
  • Last Reply

Top Posters In This Topic

the implementation must be implemented with Thumb-2 conditional instructions unless the instruction is an M68k conditional instruction

*must NOT be implemented with Thumb-2 conditional instructions.

 

To be clear, the purpose of writing the decoder and interpreter in Thumb-2 assembly is to ensure that they can run on a cheap ARMv7-M microcontroller. If the interpreter could be written in A64, then it could be faster, but there is little point. The only purpose of the interpreter, when paired with the emulator running on an ARMv8-A processor capable of executing A64, is to eliminate latency of the emulator while waiting for a translation to complete. So using Thumb-2 is fine and gets us greater compatibility from the same code I have to write.

Link to post
Share on other sites

The daughterboard is cute lol. Someone oughta make a card that goes on there. Might be hard because of space constraints though.

 

That just gave me a thought: I wonder if using a PowerBook 100 mobo's interconnect might be the best way to test your design for the Portable or even the SE. By replacing the CPU on a target system you might be able to begin feasibility testing long before developing an INIT to shut the mobo's processor down under acceleration?

Link to post
Share on other sites

Nah, I'm confident I can get it right in just one or two prototyping iterations. I think it would be hard to fit the whole system on the PB daughtercard too.

 

I've found that there actually is a single cheap FPGA that can do all of the bus operations. Not enough pins for 68020 bus but it will work for 68000. It's $6. So that saves cost over having three FPGAs.

 

But thinking about the PowerBooks has given me another idea. I was considering using a fast microcontroller to run some 68000 emulation software. It would be much slower than doing it with the Snapdragon, but these microcontrollers are like $10, compared to $79 for the Snapdragon module.

 

Since the Snapdragon is so expensive, it might be good to add a footprint for a fast (300MHz+) ARM microcontroller and SDRAM. Its implementation would cost an extra $15 or so. It would basically be an upgraded System Controller. You could then upgrade by purchasing the Snapdragon module for $79 if you want.

 

I think in this configuration, the Snapdragon would offer faster emulation, broader USB peripheral support, video output, and internet connectivity. Without the Snapdragon it would just be an accelerator that can do virtual disks and a few types of USB peripherals (probably just keyboard, mouse, mass storage in FAT32 format for disk images).

 

Is this a good idea?

Link to post
Share on other sites

I wasn't suggesting you make a board in the form factor of the 100's daughterboard. Plugging an adapter for your full size board into the daughterboard interconnect to use a naked PB 100 logic board and LCD only breadboard setup might be a helpful approach. No pesky CPU in the mix to begin with for initial testing.

Link to post
Share on other sites

No pesky CPU in the mix to begin with for initial testing.

 

 

Ah, I understand. Maybe but that's not work I really wanna do. No point making a state machine to work with a system without another 68000. I just wanna make one state machine. In choosing the MachXO1200, I've basically solved the FPGA capacity problem I was having before, and it's cheap, so I can continue with the schematic and board design. Can't use the MachXO1200 for 68020 PDS though, need more than one IC for that application.

Edited by ZaneKaminski
Link to post
Share on other sites

Ah, now I understand  .  .  .  sort of, that kind of thing is way over my head. I mostly look for workarounds/ways to cheat to make things easier. Necessity may be the mother of invention, but creative laziness is the crazy uncle nobody in the family talks about who's actually more productive! [:D]]'>

Link to post
Share on other sites

using a fast microcontroller to run some 68000 emulation software. It would be much slower than doing it with the Snapdragon, but these microcontrollers are like $10, compared to $79 for the Snapdragon module.

 

/ add a footprint for a fast (300MHz+) ARM microcontroller and SDRAM. Its implementation would cost an extra $15 or so. / You could then upgrade by purchasing the Snapdragon module for $79 if you want.

/

Is this a good idea?

 

IMO, absolutely.  Anything that brings the end cost to the purchaser down is going to expand your market.  Having a powerful but optional upgrade for later on only makes it more attractive - and, obviously, the larger the potential market, the better the chance of economies of scale. 

 

The speed penalty may not be as bad as you think, if the micro is relieved of the burden of also running a Linux kernel.

 

bbraun's existing work with interfacing the sub-$20 Discovery boards (~180MHz) to the SE PDS may be illuminating here.

 

 

Without the Snapdragon it would just be an accelerator that can do virtual disks and a few types of USB peripherals

 

Honestly, I think this (with the SDRAM you mentioned) is all that 90% of people would really want in a accelerator [1], especially if it means we're talking about a sub-US$100 end price, rather than US$150-200 and counting. 

 

Speaking of keeping costs down, I really think your best approach might be to continue focusing on a single, small SE PDS type card.  It's always tempting to add another $1 feature here and another $15 feature there, but as you've already seen, these all quickly add up.  Re-focusing down on the simplest possible board that allows for later expansion seems like a wise move. 

 

Adaption to other, CPU socket machines can be implemented as a second card. Yes, that brings the cost up somewhat for anyone with a different machine, but I think the economy of scale - and the saving in development time and PCB area - on the main card will probably even that out, perhaps even so far as to make it cheaper for everyone.  And it means you start off with a reasonable market for the simple, cheap SE PDS accelerator, which can start the ball rolling and bring in funds and interested developers for future widening of the market and applications.

 

It doesn't seem necessary to me to make a different, huge card for every machine type, and obviously the larger the card the more expensive.  Make it as modular as possible - a single, standardized Killy-type connector for all DIP 68000 machines (for example), and ribbon cables or other readymade connectors from that to a convenient mounting location for the standardized SE PDS breakout board.  Any wide span to reach mounting points could be made up with a plain sheet of plastic.

 

The CPU-PDS adapter card/s should be relatively simple, at least in the fact that you're not dealing with the high speed signalling on the accelerator.  Picking a standard expansion interface (SE PDS) means that you have a fixed target to develop towards - each machine's adapter assembly can then be developed individually, one at a time, by you or by whoever.  And if you decouple the electrical-mechanical interface from the internal machine layout (using the modular approach), you can re-use as many subassemblies as possible across different machine types.

 

----

 

[1] Ethernet or WiFI via USB or a NIC IC might be a nice add-on, but probably optional rather than standard, as other options for networking already exist.

Link to post
Share on other sites

Sidenote: it seems to me that modularizing is going to also make developing and (perhaps more importantly) debugging much less of a headache for you - in that once one module (the accelerator) is known to be functional, any problems that crop up in adapting to second or third machines can be isolated to the adaption sub-assembly, rather than debugging an entire new board layout.

Link to post
Share on other sites

 

It doesn't seem necessary to me to make a different, huge card for every machine type, and obviously the larger the card the more expensive.  Make it as modular as possible - a single, standardized Killy-type connector for all DIP 68000 machines (for example), and ribbon cables or other readymade connectors from that to a convenient mounting location for the standardized SE PDS breakout board.

 

Haven't seen an SE board in years, is the 68000 in the same Killy Klip friendly DIP package as the other Compacts? In that case the processor's legs would be the equivalent to the PDS and  only a single design would be required for 128k through SE.

 

At this point in the PCB design process, it's probably counterproductive to switch gears. On top of that, the SE PDSs is properly buffered as a dedicated expansion interface. Using an adapter based on the dual interface MicroMac design for the earlier compacts could be very inexpensive.

Link to post
Share on other sites

Honestly, I think this (with the SDRAM you mentioned) is all that 90% of people would really want in a accelerator [1], especially if it means we're talking about a sub-US$100 end price, rather than US$150-200 and counting.

Yeah, I should ditch the Snapdragon entirely for 68000 systems. On these, I am eyeing the (not yet in production) STMicroelectronics STM32H7, which is related to the STM32F4 that bbraun was using. The STM32H7 is a 400 MHz ARM Cortex-M7, basically the fastest microcontroller around. The FPGA shouldn't be necessary unless the microcontroller doesn't have enough pins to talk on the bus.  I have to study the manuals for the STM32F4, -F7, and -H7 and also bbraun's work more thoroughly.

 

The STM32H7 also has an LCD controller, so, if there are enough pins available, I can break out LCDD[15/7..0], HSYNC, and VSYNC so someone can make a VGA interface or Mac SE grayscale upgrade or something. Just 8 bits of color are enough, maaaaybe 16. I think 24 bits certainly would be too much. So that makes external video easier, which was basically bbraun's project.

 

I think design, with the STM32H7, could be priced at $100. It would be somewhat faster than the SE, but certainly not 50x faster. Maybe 10x faster. It could have USB mass storage support, with a little FAT32 and USB work. This was what I was hoping to avoid by using Linux, but it shouldn't be too bad. Same for USB keyboards and mice. I'm sure it couldn't be too hard. Maybe I will also drop the onboard SD card in favor of a bit of fixed flash memory. The SD slot is frustrating since it sort of requires a $1 electrostatic discharge protection IC.

 

 

As for 68030 systems, anything without a good chunk of a Gbyte of RAM is basically not powerful enough to accelerate a 68030 system, since the emulator would have to support the full 128 Mbytes of RAM for these systems, and then the MMU stuff requires more memory bandwidth as well. So that's where the Snapdragon will be required, that is if I ever get around to that.

 

Haven't seen an SE board in years, is the 68000 in the same Killy Klip friendly DIP package as the other Compacts? In that case the processor's legs would be the equivalent to the PDS and  only a single design would be required for 128k through SE.

 

At this point in the PCB design process, it's probably counterproductive to switch gears. On top of that, the SE PDSs is properly buffered as a dedicated expansion interface. Using an adapter based on the dual interface MicroMac design for the earlier compacts could be very inexpensive.

Nah, it's not too late in the process. I'm still exploring the options. The process of doing that half of the SE board helped me to get acquainted with the expansion interfaces, etc. There are lots of mistakes in my boards and schematics, by the way. I am a very hungry beginner at this, so the design process is not going to be perfectly smooth, but that's all good. I'll redo whatever I have to until I'm satisfied.

 

Macintosh SEs all have the 68000 in DIP-64, so I could try and do a board that works on both Plus and SE, leaving the SE's expansion slot open (would work too, for non-DMA cards). Let me do some more schematic design with the STM32H7 first though before deciding where to go with it.

Edited by ZaneKaminski
Link to post
Share on other sites

http://experiment-s.de/en/<- this guy has 68030 emulation code working for Atari for his Suska series of boards... he seems to not be updating much lately but he does cool stuff like most people around here.

 

Personally I think emulating the cpu on an ARM is self defeating, and probably not practical.... if you are going to do that you may as well run an emulator on a PC.

 

Other options might be a Zilinx Zynq ... those are around $60 but smash a speedy FPGA + decent ARM chip together. So you could do something like use a VHDL 68k core while using the ARM for accelerating other things like SSL, or providing configuration for the framebuffer output etc... or handling USB peripherals for expansion like wifi etc... Acutally I think that would be a quite workable solution.

Link to post
Share on other sites

I'm curious about the (assumed) need for meandering traces on paleolithic hardware. At what clock rate would a trace length differential of about a meter actually become problematic

 

such things become an issue above about 20Mhz, 50Mhz was refered to as black magic stuff back in the days of SparcStations... (thats the max speed mbus will run at it maxes out at 40Mhz on the SS20 for instance as it isn't designed well enough to be reliable above that but the Hyperstation 30 could beat that by 10Mhz). Another good example is PCI, which maxes out at 33Mhz at 5v PCI 2.0 doubled that rate to 66Mhz by cutting the bus voltage to 3.3V.

 

Serial links tend to run faster these days because you can do things like run differental pairs etc.. and you have less concern about cross talk with many side by side bus wires.

 

You can run buses faster at lower voltages because there is less noise... but since you inherently have to interface with the system buss of the SE etc.. you have to take such precautions.

Edited by cb88
Link to post
Share on other sites

Personally I think emulating the cpu on an ARM is self defeating, and probably not practical.... if you are going to do that you may as well run an emulator on a PC.

 

I struggle with the feeling of inauthenticity it gives me, but the way I see it is that whatever processor I use is going to run at a clock at least 3x higher than any of these FPGA cores and dispatch as many as two instructions per cycle. So there is a penalty to be paid in terms of the indirection that must be done to perform the emulation, but I'm thinking it'll be beneficial over synthesizing something in an FPGA.

 

Emulation is not that hard. Executing an instruction word is not that hard. Decode each word into an 8-bit instruction class and then 24 more bits of parameters. Based on the class, go into a jump table, leading you to a routine that executes an instruction of that class. The classes would basically cover the different types of instructions and addressing modes. The decoded instruction can be saved too, greatly accelerating the process.

 

Other options might be a Zilinx Zynq ... those are around $60 but smash a speedy FPGA + decent ARM chip together. So you could do something like use a VHDL 68k core while using the ARM for accelerating other things like SSL, or providing configuration for the framebuffer output etc... or handling USB peripherals for expansion like wifi etc... Acutally I think that would be a quite workable solution.

I do find that idea appealing, and it was the first solution I approached (though with a similar Altera Cyclone V part), but it's too costly. The FPGA is at least $50, then it needs DDR2/3 memory, flash memory, maybe a separate FPGA configuration ROM chip, may have many power rails and so requires complicated sequencing and multiple regulators, etc. And they're all in these deeeense BGA packages. So that was a dead-end in terms of cost and PCB routing effort.
Link to post
Share on other sites

The biggest advantage of the FPGA+ARM SoC chips is the speed of the interconnect between the FPGA and processor. Anything comparably fast with a separate FPGA and ARM SoC would require a lot of routing effort to get anywhere near as fast of an interface.

 

I think I'm gonna retain the bus control FPGA I've recently decided on, the Lattice MachXO 1200 (in TQFP-144). It's only $6 and using an FPGA will allow us to queue sequential writes to the Macintosh's video memory, for example. Plus it gives us much more flexibility in terms of acting like a memory-mapped "card" to the unaccelerated 68000 on the Macintosh.

 

It may also be possible to use the Lattice iCE40 FPGAs, and they're cheaper and more capacious. Downside is they're slower (in terms of pin-to-pin latency) and require external configuration. The external configuration was initially something I wanted to avoid, but maybe it would be better than ensuring something in the system can bit-bang JTAG, which is clumsy.

Edited by ZaneKaminski
Link to post
Share on other sites

I think I'm gonna retain the bus control FPGA I've recently decided on, the Lattice MachXO 1200 (in TQFP-144). It's only $6 and using an FPGA will allow us to queue sequential writes to the Macintosh's video memory, for example. Plus it gives us much more flexibility in terms of acting like a memory-mapped "card" to the unaccelerated 68000 on the Macintosh.

 

I have a hunch that even this part of your project by itself will prove useful for people wanting to develop other kinds of expansion cards.

Link to post
Share on other sites

for 68000 systems. On these, I am eyeing the (not yet in production) STMicroelectronics STM32H7

Is there a pin-compatible part in production now that you could use for prototyping?

 

The FPGA shouldn't be necessary unless the microcontroller doesn't have enough pins to talk on the bus.  I have to study the manuals for the STM32F4, -F7, and -H7 and also bbraun's work more thoroughly.

That's encouraging.

 

Pardon me for not digging up the exact link, but I gathered from bbraun's work that this series of micros have a PRU-like semi-independent fast IO subsystem that can bit-bang at ~100MHz or better.  Even if that needs downstream muxing because of pin shortage, it sounds like it will be fast enough.

 

It could have USB mass storage support, with a little FAT32 and USB work. This was what I was hoping to avoid by using Linux, but it shouldn't be too bad. Same for USB keyboards and mice. I'm sure it couldn't be too hard.

There must be some existing code out there that could be adapted, yeah?

 

Maybe I will also drop the onboard SD card in favor of a bit of fixed flash memory. The SD slot is frustrating since it sort of requires a $1 electrostatic discharge protection IC.

 

IMO, if you have USB storage sorted out, both SD and onboard flash would be redundant.  At worst they could be designed in and left unpopulated.

 

One other question - does the STM32H7 have Ethernet?

 

It occurs to me that there are existing solutions [1] for adding Linuxy-functionality to oldMacs, and if someone wants to, they could throw a cheap Pi clone into the mix themselves, with a fast link to your accelerator.

 

----

 

[1] Networking a classic Mac via serial port -> OS X /unix /Linux -> internet

    MacIPpi - Surf the Internet on your old Macintosh with TCP/IP over LocalTalk

Link to post
Share on other sites

FPGAs / require external configuration. /  something I wanted to avoid

 

So, call me crazy, but would the Mac itself be up to the task of uploading config data, via an INIT, CDEV or application?  bigmessofwires has something similar working on one of his replacement ROM products.

Link to post
Share on other sites

The STM32H7 also has an LCD controller, so, if there are enough pins available, I can break out LCDD, HSYNC, and VSYNC so someone can make a VGA interface or Mac SE grayscale upgrade or something.

Possibly the most flexible thing to do would be to bring out all unused / available pins to a standard header (SO-DIMM?) with a documented pinout, and let future devs run wild with whatever they can use there.

 

Or all the STMs pins? 

 

Macintosh SEs all have the 68000 in DIP-64, so I could try and do a board that works on both Plus and SE, leaving the SE's expansion slot open

 

Excellent thought :)

 

Link to post
Share on other sites

Is there a pin-compatible part in production now that you could use for prototyping?

Yep, in the STM32F4 and STM32F7 series. I'm reading the manual of the STM32F7 right now.

 

 

That's encouraging. [not needing an FPGA]

Pardon me for not digging up the exact link, but I gathered from bbraun's work that this series of micros have a PRU-like semi-independent fast IO subsystem that can bit-bang at ~100MHz or better.  Even if that needs downstream muxing because of pin shortage, it sounds like it will be fast enough.

Not a separate processor like the PRU-ICSS in the TI chips, just fast single-ended I/Os.

 

I am more comfortable with the FPGA approach though, since it will buy us a bit more speed and flexibility at relatively low cost, which is kind of necessary. This accelerator won't be blazingly fast, certainly faster than an SE/30, but probably not as fast as a Quadra 700. But the cost will be closer to $100, not $160+. My design went the wrong way for a while because I was trying to design an architecture that would work on 68000 but be adaptable to 68020+. In restricting it to just 68000 (at least for now), I can go back on some decisions I made when trying to support 68020+ systems.

 

For example, these STM32 microcontrollers have a Quad-SPI interface, which I originally rejected as too slow for 68030 systems. The qSPI in the STM32F7 goes at 100 MHz for SDR or 80 MHz for DDR, and you can combine two together for a sort of Dual-Quad-SPI setup (different from Octo-SPI, which is kinda rare). This interface is plenty fast to transfer commands and stuff to the bus interface FPGA for a 7.8336 MHz 68000 system. So in reducing the pin count of the processor-FPGA interface from some 32 pins or something I had before, to the six pins of qSPI (4 data, clock, chip select), I can get a cheaper FPGA or some 4x more capacity for the same price.

There must be some existing code out there that could be adapted [for USB peripherals], yeah?

Probably, but I have heard a lot of bad things about many vendors' USB stacks, so we shall see. FAT32 is easy in comparison. There are a lot of FAT32 libraries around.

IMO, if you have USB storage sorted out, both SD and onboard flash would be redundant.  At worst they could be designed in and left unpopulated.

SD is pointless, but there maybe should be the capability for more than the 2 Mbytes of onboard flash on the STM32H7. Might be useful to store a ROM image or system disk or something on the system. But yeah, the footprint can just be left unpopulated.

does the STM32H7 have Ethernet?

It does have part of what's required for an Ethernet interface, but it doesn't have the physical interface integrated. You have to send some digital signals that come from the STM32H7 to an external "PHY" IC. I would say that the main loss in this new design that users would care about is networking. There are USB WiFi dongles but I doubt anyone will write a driver for one of those.

 

So, call me crazy, but would the Mac itself be up to the task of uploading config data, via an INIT, CDEV or application?  bigmessofwires has something similar working on one of his replacement ROM products.

That's a funny idea lol, but the problem is that some microcontroller or something actually has to connect some GPIOs to the JTAG header pins, and probably shouldn't itself be be on the JTAG chain.

 

The latest design I'm imagining uses a Lattice iCE40 FPGA ($5 for "1k" capacity, comparable to the MachXO1200, $6 for 3x the capacity), which doesn't have JTAG, but instead has an SPI programming interface. That's much easier to talk to from the STM32H7, so problem solved. The iCE40 also is more accommodating of DDR signals than the MachXO, so that's nice. Actually, the iCE40 feels much more high-end than the MachXO, other than that it's a tad slower than the MachXO in its highest speed grade. iCE40 has no speed grades.

 

Possibly the most flexible thing to do would be to bring out all unused / available pins to a standard header (SO-DIMM?) with a documented pinout, and let future devs run wild with whatever they can use there.

 

Or all the STMs pins?

I'll make sure to break out the cool interfaces. There are several sizes of the STM32H7. BGAs aside, there are LQFP-144, LQFP-176, and LQFP-208 packages available.

 

The LQFP-144 ones don't support 32-bit SDRAM, which I wanted to use, so they're out. Also, about RAM, I have decided to use only one chip, not two. The routing effort for two SDRAM chips is much greater than for a single chip in the so-called "point-to-point topology." 

 

So that leaves LQFP-176 or LQFP-208. I'm gonna go over the relevant signals soon and see which one is best. Ideally the LQFP-176 would be sufficient. So I dunno if there would be that many extra pins. When there are only two signal layers (on a four-layer board), and you have splits in the power plane, etc., that makes it hard to bring all of the signals somewhere, especially when the SDRAM and QSPI signals should have their impedances controlled, etc.

 

Going by what I see in my head involving the either/or connector capable MicroMac card, that probably won't work out due to clearances. :-/

 

The issue is that on the Plus, there isn't that much room between the 68000 and the power/video connector.

 

But also, the problem is that SE users will want to mount theirs with the PDS, not on the 68000, even with the Killy Klip. And it's not like you can mount a horizontal SE-style card in the PDS with the accelerator in there.

Edited by ZaneKaminski
Link to post
Share on other sites

I say, for Ethernet (and WiFi), with this design, leave internet connectivity out. The way I see the peripheral expansion is that breaking out more interfaces is hard, so I wanna break out just USB 2.0 HS, the LCD interface (basically digital parallel VGA), and then a UART or two for debugging or low-cost peripheral expansion.

 

A cheap WiFi connector could just connect an ESP8266 WiFi module to the UART haha, but a more elaborate and full-featured solution for Macintosh SE would not be too difficult or expensive. USB hub chips are only a few bucks, the vertically-oriented USB Type-A connectors are inexpensive as well, then throw in the ESP8266 and a cheap microcontroller to connect it all. Could have a cheap resistor DAC to make VGA too. The STM32F429 is only $9 and has USB 2.0 high-speed, an additional LCD controller, and enough internal memory for a 4-bit 512x342 framebuffer. Putting all that on a board, along with a yoke board that'll do grayscale, would be a compelling and relatively inexpensive ($60?) upgrade for owners of the accelerator. On the other hand, it can only be that cheap if someone puts in all of the software and design work for free, and I dunno if I'll make it. But it should be straightforward for anyone experienced with putting a microcontroller and some other stuff on a board.

Edited by ZaneKaminski
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...