• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

68060 accelerator cards for Mac: Would you be willing?

Quadraman

Well-known member
Why? When it would be even harder* than doing one for a 68k machine, and you can easily stick a G3 or G4 in there.
*or perhaps impossible. Remember the Coldfire is kinda-sorta 68k compatible, and not even remotely related to the PPC.
Why? So you can run your 68k apps on a machine with a faster bus, more memory, bigger hard drive, better video, etc.

 

Unknown_K

Well-known member
You can run 68K apps now on a faster PPC system with little added costs (G3-400 on a Powermac 7500/8500 will run 68K OS 7.5.x+ apps just fine). Amiga and Atari people spend big bucks on slow PPC and fast 68060 processors because their platform died and they have no other choice (outside of emulation) and even they only buy NEW designs by the handfull.

If motorola got bored they could take the old 68040 design, add 2MB of cache to the die and shrink it with new production methods and probably get a 68040/800Mhz that runs without a heatsink. Thing is developers have moved on and few would buy the new chip.

 

SiliconValleyPirate

Well-known member
Not gonna happen. The Amiga 060 cards only work with apps that go through the 68060.library that contains the patches required. It was relatively east on the Amiga. I challenge anyone to build a similar library for the Mac that loads at ROM level (which would probably be required). Daystar did it basck in the day with the Turbo040 and Turbo601 cards but I doubt anyone has the documentation, facilities or motivation to do it now.

IMHO the old Mac OS isn't worth building 060 cards for ( many others will disagree, and disagree I will, happily!), 8.1 runs well enough on a 040 for me. That said I *own* an Amiga 060 machine (an A4000 with a Cyberstorm060 Mk1 and 128MB RAM) so I can get an 060 fix anytime I need one.

Amiga and Atari people spend big bucks on slow PPC and fast 68060 processors because their platform died and they have no other choice (outside of emulation) and even they only buy NEW designs by the handfull.
At least in the Amiga arena (where I have lots of experience), it created some really great machines at the time, but they are all slow compared to modern machines. PowerPC has never been properly implemented on the Classic Amiga anyway - at best PPC cards use PPCs as co-processors, and most PPC apps are frankly worse than the 68k versions.

As I stated above, and in the Amiga vs. Mac thread however one compelling force was the flexibility and expandability of the Amiga platform. Commodore developed the ability to work CPU patches into the system in OS 3.x to allow the 68040 to work with 68030 software reliably on the Amiga Workbench. That meant the 060 was very easy for experience Amiga techs to work into the OS.

If motorola got bored they could take the old 68040 design, add 2MB of cache to the die and shrink it with new production methods and probably get a 68040/800Mhz that runs without a heatsink. Thing is developers have moved on and few would buy the new chip.
Freescale (who are the present owners of all Motorola's CPU technology that they spun off in 2004) produce a chip called Coldfire, which is a 68k compatible chip that runs at 360MHz or more. Several projects have worked on bringing that to Amiga, but so far non have produced salable item based on Coldfire, partly because of the hurdles interfacing the main system CPU running at 360MHz with a 7MHz chipset, and partly because of significant differences in the code base.

If you guys were really serious you would want a Coldfire 68k Mac. It'd give a 601 or mid-range 603e a serious run for its money. Amiga 060s will easily play a 128kbps MP3 with cycles to spare, so I'd imagine a Coldfire machine would be a sight faster still!

Realistically, if you want a faster Classic Mac, Unknown_K is right - use a PPC. A small number of apps don't run well on PPC but they are a really small number, which you could probably run on an old 030 Mac without any fuss anyway. One reason I have done little with the 68k MAcs in latter years is they are a backwater - stuff pretty much dried up for them in the mid-1990s and they have never seen anything exciting since.

 

trag

Well-known member
Is there anything that can be learned about 68000/68030 implementation on an FPGA/PIC from projects like MAME?
Aren't PICs something like small 8 bit processors? Are there 32 bit PICs? In my opinion there's no point at all in trying to emulate a 32 bit processor (68030) in a narrower architecture, e.g. 16 bit or 8 bit.

You'll have to refresh my memory about MAME. I've heard of it but cannot remember a thing about it at the moment.

The Coldfire to 68030 idea has actually been stealing my brain's CPU cycles this past week. It's an intriguing idea. I don't think I'll ever do anything with it as there are too many hurdles which must be solved in software and I'm just not a software guy. Now if I could get a team of five or six dedicated 68K/assembly software folks to work with me on hardware I produce, then I could see it happening--and maybe kick in some of the proto-type costs. :)

Here's what I'm thinking. If one builds a Coldfire (hereafter CF) board, one should put the system memory on the upgrade. Also, some of the CF chips have DDR controllers built in, USB2 built in and 10/100 ethernet built in.

So, what I propose is that we build (or fantasize about) a Coldfire board with an SODIMM socket for a single Laptop style DDR DIMM, a USB2 port and a 10/100 ethernet port on board. One or two MB of fast Static RAM L2 cache would be nice, but it probably isn't manageable, because the DDR controller is built into the chip. It's worth looking into though. There may be a way to do it.

At boot time, the CF should interface with the host motherboard (hereafter the 'host') and copy the ROM contents to local memory so that it will not be slowed down by toolbox calls which need access to the ROM. The only time the CF should need to wait around for the host is when it's accessing the peripheral systems. The worst performance hit is going to be the video systems/cards for the various hosts. They can't be ignored or bypassed and they need frequent attention.

The main hardware problem I see is how to avoid contention between the built-in memory controller and the motherboard memory controller. It should be possible to map the CF memory space into the proper place in the Mac Address Map so that it will be compatible with the scheme built into the Mac ROMs. But something needs to prevent the host memory controller from responding to memory operations.

For example, if the code being executed calls for a Read to memory to go out on the bus, we want the DDR memory to respond to that, but not the host's memory controller. Otherwise, the CF will get the Read data from DDR memory, go speeding along and twenty cycles later the host memory controller either delivers up other data, or throws an error, either way, clogging up the host bus with unexpected data.

One way to deal wtih this is circuitry on the CF board which would simply block any transactions to the host board which take place in RAM's address space. This is fairly easy and cheap, provided that the CPU is the only device which originates writes and reads to RAM. Are there any DMA devices which write directly to or read directly from RAM in the Mac II family? The built-in video on the IIci and IIsi?

Is there a signal the CPU can issue to disable the motherboard RAM controller?

So the first things to do would be to get the 68030 datasheet and User's Guide (got those) and compare them to the CF documents to confirm that they really are signal compatible. One wouldn't want to find that there's some obscure but essential bus arbitration signal, e.g., present on teh 68030 and missing on the CF or vice versa.

Then an instruction by instruction comparison of the instructions sets would be a good idea. We know that a bumch of 68030 instructions and addressing modes are missing from the CF, but it is good to nail down which ones. Then compare the missing instructions/modes with the documentation for that library (mentioned on Freescale's site) which is supposed to emulate the missing instructions and make sure everything is there.

If it is, then one can probably produce boot code for the CF card which will cause it to power up, do a memory test, load a set of exception vectors that point at emulation code for the missing 68K instructions, copy the Mac ROM from the host bus to the local memory and then begin executing the Mac ROM to boot as a Macintosh.

Drivers would have to be written for the built-in 10/100 Enet and the USB 2.0 port as well, but that might not be too hard (for software types). Once written it would be easy enough to incorporate those into the boot code of the CF card stored on a Flash chip on the card.

So the USB port could be bootable and conceivably, the 10/100 port could make the machine net bootable. as well.

I'd really like to get some kind of disk controller on the CF card as well, but can't think of a good way to do it. The reason is that going out to the host for ssslllloooowww SCSI access is punishing. One could connect storage to the USB 2.0 port, but it's kind of clunky to got USB 2.0 to IDE inside the Mac. Works fine for external drives, but the user should be able to use internal drives without taking a performance hit.

Oh, while we're writing a special boot ROM for the CF, it should also patch the SCSI Manager--well replace it with a version which is SCSI Manager 4.3 compatible. This is discussed in some (but is it enough?) detail in the Inside Macintosh books. This would allow the USB2.0 bus to appear as a second SCSI bus to the Mac Device Manager and/or Slot manager. Then the USB 2.0 driver would be written to appear as a SCSI SIM.

So, we need POST and exception handling code in the boot ROM, SCSI Manager 4.3 patch, USB2.0 and 10/100 drivers, and some code to copy the ROM to local memory. We may also need a software solution to possible contention wtih the motherboard memory controller but that can probably be handed in hardware.

I would like to reverse engineer a PowerCache030 some time to see what it is doing to interface the faster CPU to the slower host machine. Daystar had some smart people and I'm sure that they dealt with some issues in that interface that I would never think about and might never find in a million years of looking. That would mainly consist of removing the Gal16V8 chips and trying to determine what their programming is.

After that the place to start is with a Coldfire development board. Most CPUs have Development Boards available for them. The chip is mounted on a board which provides many of the support functions and interfaces oen needs in order to get started working with the chip. There is also usually a programming interface and debug interface to a host PC so that one can write programs to run on the chip, and then debug them.

Then an interface board between the CF development board (there are usually connectors which bring out most of the CPU I/O pins on development boards) and the first target Mac would be needed.

After that I'm a little hazy about how to proceed. How do you incrementally develop the thing. In other words, how can you tell if any part of the emulation is working, until you get the whole thing (or most of it) working properly. Perhaps pull the Mac ROMs and substitute simple ROMs with just a few instructions to execute and then put test code in teh faux ROMS during development, adding progressively more complex code until one is ready to try operating with the actual Mac ROMs... Hmmmm.

 

trag

Well-known member
Hah. I forgot the conclusion.

So if this all works we basically end up with a 300 MHz CPU connected to DDR memory running at close to the same speed as the CPU (latency is high, but bursts are good) and USB and ethernet ports close enough to the CPU to run at full speed. The Mac motherboard becomes a peripheral interface, which happens to provide the ROM code as well.

Even with the instructions which will throw execptions I bet this would run at least four times faster than a IIfx and maybe better.

Video performance would still be abysmal though.

 

kreats

Well-known member
I don't understand the obsession people have with the coldfire.. it's just incompatible enough to screw everything up and offers little to no benefits as nobody will develop for it.

trag - I know you've done a few nice projects, but an accelerator is a big job. Maybe start with something simpler.. like a nubus USB board perhaps? A flash SIMM board that could hold all 3 iifx/iisi/se30 roms would be welcome development too.

 

Quadraman

Well-known member
Even better would be compact flash drives. With the low power of the PSU's in those things, adding fast hard drives can be taxing.

 

kreats

Well-known member
I dunno about IDE, as there are a few existing solutions that are fine:

1) SCSI-IDE converter

2) Ultra Wide drive with a UW nubus card

3) Ultra Wide drive with a 65-50 pin converter

and if you wanted CF you could stick a CF-IDE converter on the SCSI-IDE converter.

 

Charlieman

Well-known member
Is there anything that can be learned about 68000/68030 implementation on an FPGA/PIC from projects like MAME?
Aren't PICs something like small 8 bit processors?
You're right. There are 8 and 16 bit PICs. They are fast enough to emulate a slow 32 bit device (eg USB 1.1 client) but not a *fast"* multi function computer.

MAME is an arcade game emulator which runs on many platforms. Install MAME; find the ROM image for the arcade game; play. The emulator includes libraries or modules that translate processor instructions from the arcade game to the host machine. If you can find a way to use the libraries, you don't need to reinvent. Enthusiasts have done that already to emulate computers (see MESS).

 

Bunsen

Admin-Witchfinder-General
Coldfire on PPC machines
Why? When it would be even harder / or perhaps impossible.
Why? So you can run your 68k apps on a machine with a faster bus, more memory, bigger hard drive, better video, etc.
Every single other thing on the machine, including the ROM, is expecting a PPC, not a 68k or a Coldfire. Won't work, IMHO.

I could maybe see it working as a PCI card with a set of INITs that patch the ROM to trap all 68k code and redirect it to the Coldfire with some re-interpretation along the way for incompatible opcodes. Maybe.

But then one really does have to wonder why you'd bother when it would be a whoooole load easier and cheaper to stick a 1 GHz G4 in there and run 68k apps on it.

 
Last edited by a moderator:

Bunsen

Admin-Witchfinder-General
68K instructions which are not supported on the Coldfire, should generate an exception, which can then be handled by executing the proper code on the Coldfire to emulate the excepted 68K instruction. Very neat. It's the same mechanism which the Mac Toolbox uses to execute its routines.
I wonder ... if this is a PDS card, could we redirect the excepted instructions back over to the original 680x0 for execution? Or build the card with a 68k co-processor onboard?

What is the performance like?
There is quite a significant overhead incurred when the ColdFire processor encounters an unimplemented instruction which has to be handled by CF68KLib. As a result, the effective performance which you will see will depend very much on whether your application triggers lots of exceptions within performance-critical loops. For best performance of production code, we recommend either recompiling for ColdFire (if the code was written in C or another high-level language), or translating using PortAsm/68K for ColdFire (if the code was written in assembly language).
In short, one wonders if a fast 68k copro would help ...

Does CF68KLib implement floating-point instructions?
No.
... and an FPU.

Another approach would be to design a 68030 chip into an FPGA. ...Beyond properly emulating each instruction, one must make sure that all the control registers, interrupt functionality and supervisory systems operate as expected. That's a lot of stuff.
As well, if we're going to use this an accelerator, it has to be able to interface to the Mac at its original CPU bus speed, and cache instructions and data between accesses.

Again:

You might also be interested in this PPCMLA discussion
and the follow-one, more technical discussion at Applefritter.
 

Bunsen

Admin-Witchfinder-General
8.1 runs well enough on a 040 for me. That said I *own* an Amiga 060 machine / so I can get an 060 fix anytime I need one.
Have you ever run a Mac emu on the 060 Amiga and compared? Just curious.

Excuse me guys: I'm just catching up with this thread.

 

Bunsen

Admin-Witchfinder-General
It occurs to me that my idea of using 68k copros is just making everything more than twice as complicated.

 

Bunsen

Admin-Witchfinder-General
Freescale / produce a chip called Coldfire
one could always read the thread before posting
A small number of apps don't run well on PPC but they are a really small number
Is there any 680x0 code which cannot be run under CF68KLib?
/ a few programming tricks which are legal but deprecated on the 68000 family will prevent successful operation under the library. /
So they are probably the same apps which would break under Coldfire too :-/

 

SiliconValleyPirate

Well-known member
I didn't notice Coldfire had already been mentioned - I guess I'll just blame bad eyesight or the fact it was getting late or something lame. I think more coffee is in order...!

There are code differences between 68000, 68020/030 and 68040 strains inside the 680x0 family that stop code written for one running on the others, but it's not caused any issues in Mac OS because the OS was programmed or amended to run on the later 68k chips.

I see no reason why the same would not apply to CF68k chips, given the correct intermediary, possibly on the form of a FPGA, between the computer and the CF68k chip. It'd not be ideal but the commands that are mentioned in the FAQ are well documented so they could be trapped and redirected or re-coded to legal routines if you had an experienced programmer on the case. Either that or you need a pre-boot patch ROM that loads the necessary patches to 'patch' the Mac OS ROM. Daystar's Turbo cards do this, so it's possible.

IMHO that would be required to use either a 68060 or a Coldfire as both vary slightly from earlier models and there's no way to patch the OS at a low enough level as to sort it out before code execution takes place, short of loading it from ROM at power-on otherwise.

Of course this is all pretty moot as it's never going to happen, but it makes for interesting speculation :)

 

trag

Well-known member
I don't understand the obsession people have with the coldfire.. it's just incompatible enough to screw everything up and offers little to no benefits as nobody will develop for it.
The nice thing about Coldfire is that they are the only chips around which are (almost) compatible with the 68K instruction set and are available for $30 and less in speeds an order of magnitude faster than are in the original machines. I don't think any other chips come that close to providing an affordable speedup for 68K based systems.

As I mentioned above, the problem with 68060 upgrades is that the 68060 chip costs about $300 each from Freescale. You could build an accelerator for them, but at those prices three people will be able to afford them.

Heck, I had an idea about building a copy of the PowerCache030 for SE/30s until I priced 68030 chips. The original 68030 chip is priced around $100 each depending on the speed. Building new accelerators around those chips isn't practical either, at those prices. That's not even counting the FPU chip to go with it.

You're correct. It is incompatible or, at least, not fully compatible. But, if one wrapped the hardware and software around it to make it compatible it doesn't matter if anyone develops for it. It would look like a fast 68K to the host Macintosh. Old and new 68K software would run on it. That's the dream. More about dreaming below.

The big attraction of the Coldfire is that presumably, most of the code would not be emulated. That depends on the actual instructions used in 68K programs, but it would be nice to have an accelerator which is mostly running object code natively without emulation.

trag - I know you've done a few nice projects, but an accelerator is a big job.
Yeah, my previous projects are not in the same ballpark, neighborhood, nor municipality as an accelerator. But I did mention in my post (see paranthetical comment) that it was a more of a fantasy and that for there to be any realistic chance of realizing such a dream I'd need a bunch of software guys to help.

On the other hand, in my experience, the trick to doing large complex projects is to rob them of their complexity by breaking them into smaller doable chunks. That's part of what I was trying to do in my previous post. We don't know how to make an accelerator, but we could compare the list of signals in two datasheets (680030 and target Coldfire). We can also compare instruction sets from two different datasheets. Those things are simple, though the latter would be tedious.

Once we had documented the differences, I'm pretty sure we could check the documentation of that 68K emulation library to see if it makes up the difference.

I know how (in theory) to map out the functions of the GAL chips on an old Daystar accelerator card. Translating it to functions it's performing on an active IIci or SE/30 bus would be more daunting, but mainly because one would need a strong understanding of how the 68030 runs its bus and communicates.

Once we knew all those things, we would (roughly) know how to connect the host Macintosh bus to GLUE logic (probably in an FPGA) which would perform the functions that the old GALs did, plus any new Coldfire specific logic, and connect that to the pins of the Coldfire chip. I've done FPGA programming so I'm pretty sure I could program a chip to perform as the GLUE.

I certainly know or can find from documentation how to connect a boot ROM (programmable Flash) to a Coldfire chip. I'm not as certain that all the unsupported 68K instructions can be made to generate exceptions. But that will be apparent from the chip's documentation and the comparison of the instruction sets.

Hooking up the USB and 10/100 ports is trivially easy. Connecting the DDR memory is more difficult because all the traces need to be the same length and mostly insulated from noise, which probably means on interior layers with power or ground layers between them and the outside world. This may create a need for a six layer board, blech. Prototypes with 4 layer boards are cheap (<$200 per three) Prototypes with 6 or more layers are much more expensive.

So, from my hardware point of view, I think it is doable. It would take a lot of time.

It is also possible that the initial documentation studies would cause one to conclude that it isn't practical. Perhaps the Coldfire I/O busses are just too different from the 68K. I'm assuming at this point, that they're fairly similar with 32 address and 32 data lines, plus similar or identical bus arbitration signals.

Maybe start with something simpler.. like a nubus USB board perhaps?
That is mainly a software problem.

There are USB chipsets which interface directly with a CPU bus rather than to PCI. So it would be fairly trivial to get one of those chipsets onto a NuBus card. It would need an FPGA (or at least a CPLD) to provide the GLUE between the NuBus and the USB chipset. In many ways it'd be easier to just build a PDS to USB board. There's a lot less translation from bus to USB chip that way but some kind of hardware interface between the USB chipset and the PDS slot would probably still be needed.

How do you write a USB driver after that though? Ideally, one would write it as a SCSI SIM so that the USB bus would be bootable. Know any 68K Mac developers who are bored and need a large software project? Writing USB drivers would probably be pretty large, I think. But if you can come up with some serious 68K/USB programmers, I'm willing to team up and support the hardware side. They're also going to need to write a SCSI Manager 4.3 type XPT so that the host machine can handle having more than one SCSI bus.

Discussion of XPT and SIM can be found fairly early in Chapter 4 of "Inside Macintosh, Devices" which is downloadable from Apple as a several PDFs (one per chapter, chapter 4 is titled "SCSI Manager 4.3).

What I'd really like to do is build an IDE board for 68K Macs. That's been kicking around in my head too. It'd be cool to have an interface for laptop IDE drives in the old 68K models. I just haven't made the time for it and now I've started a new job and have even less time.

I've been telling myself that when I finish assembling the last few IIfx SIMMs I have laying around here I'll start on the next new project, but the blank PCBs are still sitting on my bench. Sigh.

A flash SIMM board that could hold all 3 iifx/iisi/se30 roms would be welcome development too.
I already have a (non-writable) board design drawn for the ROM SIMM. I've never had it fabricated because it doesn't appear to make economic sense. I also think that I may need to revise it to put two chips on each side, instead of four chips on the same side. If one is hand soldering the chips onto the SIMM, one needs space between the chips for the soldering pencil.

It would cost about $600 just to have 200 SIMM circuit boards made. That's not counting the cost of chips. After that the SIMMs can be built one at a time as needed at a marginal cost of about $4 for the chips. It seems unlikely to me that there are enough folks wanting ROM SIMMs to make up the $600 plus $4 per SIMM even at $20 or $30 per SIMM, assuming they would pay that much. And that's not even considering the time involved. While fewer than 200 boards could be made, the total cost doesn't actually drop much or at all, the unit price just goes up.

And there's the issue that such a board would violate Apple's copyrights, though why any sane person would care at this point... Still, makes it kind of hard to advertise and sell.

If you want the board flashable as well, that would require hooking up the Write_Enable signal (assuming it's even present in the ROM socket) and the then coding up a software routine which can massage the Flash chips into writing their contents. That's software again, at which I am marginally competent, and more importantly, just not that interested. I can connect the wires. Writing the routine which will properly massage the Flash chips into being written does not interest me.

 

trag

Well-known member
It occurs to me that my idea of using 68k copros is just making everything more than twice as complicated.
More importantly, it makes it too expensive.

The very first thing we need to look at in any concept is the cost per unit, I think. 68K CPUs cost about $100 each depending on the model. If you can live without the MMU and at low speeds, they get cheaper.

68060 and 68040 are also too expensive.

Coldfire chips are cheap enough (~$30) but I see your post about them lacking an FPU. So, we either live without an FPU in accelerated systems, or....

If we're going to try to implement an FPU in an FPGA, then I'd say we're better off just going all the way back to the emulate the 68030 in an FPGA. Besides, the CF doesn't have a built-in interface to an FPU the way the 68030 does, does it? That would make providing an external FPU much more daunting.

FPGAs with about twice as much logic as found in an actual 68030 cost under $20 and run at 200 MHz.

The 68030 contains about 273,000 transistors. The 68040 contains just under 1,200,000 transistors.

If we assume that there are about 4 transistors per gate (this is very conservative from the point of view of this estimate) then the 68030 contains about 70,000 gates and the 68040 contains about 300,000 gates.

The Xilinx XC3S500E contains about 500,000 gates and costs $20.75 when purchased individually from Digi-Key. I'm not sure how FPGA gates translates to CPU transistors or even CPU gates though.

There are at least 4 transistors in each real world gate, except inverters which only have two. If we assume an average of four (which should be low) we get the numbers above for total gates in the target CPUs. I would assume that one needs many more gates in an FPGA to get the same functionality found in a CPU, because we're not custom designing the logic. But how much more?

The XC3S500E has 7 times as many gates as a 68030. Is that enough? Is it too much?

Does a 200MHz FPGA run non-optimized logic at a speed faster than 40MHz 68030 with custom logic.

 

trag

Well-known member
I dunno about IDE, as there are a few existing solutions that are fine:
1) SCSI-IDE converter

2) Ultra Wide drive with a UW nubus card

3) Ultra Wide drive with a 65-50 pin converter

and if you wanted CF you could stick a CF-IDE converter on the SCSI-IDE converter.
I'm not sure if I'm interpreting your comment correctly, but there is no NuBus UW SCSI card. The fastest SCSI card made (or shipped anyway) was Fast & Wide with max transfer rates of 20 MB/s theoretical. UW is 40 MB/s theoretical maximum and was the next step up and the last one before LVD hit the scenes.

 
Top