Serious proposal: accelerator and peripheral expansion system

Trash80toHP_Mini · Dec 12, 2016

ZaneKaminski said:
I planned to use the ROM from the machine as-is, and the software in the ROM only works under the assumption of the presence of some particular chipset.

That just reminded me, dunno about other accelerators, but the Radius had a license for the Rocket to copy the crown jewels into its onboard memory to avoid the bottleneck of ROM access over NuBus. You'd mentioned maybe doing so in your project, dunno if you've made a decision yet, but I'd say it would speed things up considerably to have the Toolbox on board.

Gorgonops · Dec 12, 2016

ZaneKaminski said:
As it is, the performance of a Macintosh SE with the accelerator will be faster than an SE/30, hopefully faster than a IIci, but probably not as fast as a system with a 40 MHz 68040.

Just tossing this out there since I vaguely recall a target price for this thing in the one hundred dollar ballpark: the 68000 softcore in the MIST reconfigurable computer, which sells for about $200, is capable of speeds up to about 48mhz. (And seems to also have at least partial 68020 compatibility.) I have no idea how much just the FPGA in the MIST costs or if there might be a smaller/cheaper one that has sufficient capacity to run the softcore and some bus glue while dispensing with the capacity to emulate the rest of computer, but... can't help but make me wonder if the performance bar is this low and you're using FPGAs for bus glue anyway if an all-FPGA approach might be simpler and end up costing around the same.

ZaneKaminski · Dec 12, 2016

Trash80toHP_Mini said:
have the Toolbox on board

Haha I'm way ahead of you on caching the ROM.
The STM32H7 has a "scattered memory architecture," with a bit over 1 Mbyte of SRAM but split into different banks and sizes to optimize throughout. And then there's the 32 Mbyte external SDRAM.

Firstly, the emulator software has to be stored in the 64 kbytes of "instruction-tightly-coupled-memory" (ITCM).

Any data structures, like trees, linked-lists, tables of pointers, etc. need to be stored in the first 64 kbyte bank of "data-tightly-coupled-memory" (DTCM). There is a second 64 kbyte bank of DTCM too.

The -H7 has a main SRAM of 512 kbytes, in which I planned to store the 171 kbyte (512x342x8bit) screen buffer for 8-bit grayscale on the compacts.

Other memories on the -H7 include two 128 kbyte banks and another 32 kbyte bank of SRAM in the "connectivity domain," another 64 kbytes in the "batch acquisition domain," and 4K of battery backup SRAM, like the Mac's clock chip has.

Then in the external memory will be stored the up to 9 Mbytes of RAM, up to 256 kbyte ROM, and then the full decoded cache for each, so 300% of 9.25 Mbytes, which is 27.75 Mbytes. The ROM will be loaded from the machine and fully decoded before boot. RAM is never written back to the Macintosh except VRAM, the cache of which would be in a write-through configuration.

There is a lot of internal memory left, some of which will have to be devoted to a USB buffer. What's left, however, can be used to move select areas of the RAM, ROM, and their instruction word decodings into on-chip SRAM. This is much faster than external SDRAM.

So for a given ROM, some "hints" should be supplied that tell what to store in internal SRAM and what to store in external SDRAM.

Memory accesses are going to go through a doubly indirect tree-type structure. What I mean is that there are 256 64kbyte chunks in the 24-bit address space of the 68000, and within each 64 kbyte chunk, another 256 chunks of 256 bytes. So you can sorta make a tree (actually it's not technically a tree) out of this, where first you look at A[23:16] and take that offset into a table of pointers, and then use A[15:8] to do it again from there. That can lead to a routine to access memory at that location.

If I didn't explain it well enough, the gist is that each 256-byte region of memory can be put in a different place or accessed with a different method. So that's how addresses accessed by the virtual 68000 will be resolved into an actual 68000 bus access, an SRAM access, SDRAM access, etc.

The hints supplied to the emulator would basically break the ROM down into 256-byte chunks and tell which should be in SRAM and which should be in SDRAM.

Tuning the emulator will be a problem in and of itself though lol, lots of tweaking can be done to improve performance, especially for a specific application.

Trash80toHP_Mini · Dec 12, 2016

ZaneKaminski said:
Haha I'm way ahead of you on caching the ROM.

Excellent, ISTR you mentioning it, but not what you'd decided.

ZaneKaminski · Dec 12, 2016

Gorgonops said:
all-FPGA approach

I'm just not a CPU designer is the thing. I designed a toy CPU this semester for my computer architecture class. Not something I really wanna do again. My strength is mainly as a programmer.
The key problem in making this kind of low-cost, high-performance, eccentric thing is choosing which parts to be hardened and which parts to be softer, i.e programmed, microcoded, indirect, etc. The STM32H7 has a hardened cache, for example. So even though memory accesses are gonna be be doubly-indirect, the fact that it's done on a hard processor is advantageous.

The easy case is that processors are always hard and glue logic is implemented in CPLDs or something. The hard case is when you're trying to process this instruction stream for an outdated processor faster than it ever ran. How to balance the hard and soft aspects is a very complicated problem, dependent not just on the MC68000 ISA, but also the very specific details of the products available.

Just one tiny difference, for example, whether the qSPI interface supports transmission of the command code over all four data lines or just one, makes it or breaks a certain design.

So this device you mention, if it supports, in theory, any CPU architecture (that can fit), then the trade-offs are different and so the fully-programmable FPGA approach is better for them. But again, they don't get hardened execution units and caches and all that.

So I think that this architecture is now adequately cheap ($100) and performs faster than a similarly priced accelerator with a better FPGA.

ZaneKaminski · Dec 12, 2016

There could be a cheaper product with only a (more capacious) FPGA, but like I said, I wouldn't be great at making one. I'm a programmer.

Gorgonops · Dec 13, 2016

Just for larfs I did a little digging regarding that "Vampire" accelerator for the Amiga 600, and for whatever it's worth the schematics/BOM/whatnot of the "prototype" version of it are on the web. (The website is pretty horribly organized and all the latest content is about the "V2" version of the accelerator, but the "About/Schematics/etc" entries on the sidebar are about said prototype.) Even if you stick with this using-a-microcontroller-for-the-actual-CPU idea I wonder if there might be something in there that could be useful to you. This early version didn't use the proprietary "Apollo" core (IE, the one that claims 3x the performance of the best 68060, which I honestly suspect is the sort of performance it would be difficult to match with that Snapdragon module you were looking at), it used the TG68 core from OpenCores running on a Cyclone II FPGA that seems to sell for about $25 on Mouser.

The best performance he claimed from this combination was only about two and a half times the native speed of the 68000 (verses the, what, 150+ times faster they're claiming from the Apollo core version), so I'm willing to grant that if you can work out the bus interface issues your HW+emulation hybrid might be capable of beating that. Nonetheless, well, I do wonder if there might be something there you can reuse; after all, he *did* successfully work out the bus logic to integrate an "alien" core and a 64MB RAM expansion into an accelerator in a 68000 socket using a relatively small and cheap FPGA. He has his home-grown work open-sourced here, perhaps there's something in there that might save you some work designing your own bus logic.

Of course, maybe you noticed that yourself already and are already leveraging it and I failed to see while scanning the thread, in which case... never mind.

It is a shame that the Apollo Core people seem to be so difficult to work with. I do gather from some of the chatter on the forums for their product it's in part because they have "investors" they're working with that might be trying to turn their work into a commercial ASIC, which means all sorts of disclosure/need-to-know issues crop up.

techknight · Dec 13, 2016

ZaneKaminski said:
Nooo, unfortunately I never planned for that. Emulating a different CPU complicates things. I planned to use the ROM from the machine as-is, and the software in the ROM only works under the assumption of the presence of some particular chipset.

68030 systems all have, for example, the Apple Sound Chip, which is absent from 68000 systems. So that would need to be emulated, and accurate peripheral emulation is a project in and of itself. My aim is to emulate the exact same processor as comes in the machine, but seemingly executing many more instructions per clock.

What machine in particular did you want these more advanced features (web browser and 68020+) for?

For Plus and SE, if you want 68030, the solution is to upgrade to SE/30 and get the accelerator for that. For your Portable... there is no solution in the exact same form-factor. PB140/170 is the closest.

Darnit.... Well.. that blows my plans to bits.

I need a 68020+ accelerator/CPU for the Portable which is a 16Mhz base 68000. You need a 68020 or higher to support the CFM68K runtime.

Reason I need a CFM68K runtime is so I can run RealBASIC applications which compile using that runtime. No, I cannot program in C or any other language very well except BASIC. I was going to write an MP3 player software that runs on my portable which controls a decoder card that I designed.

I built a decoder card with an MP3 DSP and an atmel MCU that I was going to interface with the system using the PDS or even serial port. The one I have right now is serial based, and the storage is actually an SD card on the decoder itself.

Problem is, I cant write the software for the mac to control the card because well, it wont run CFM68K apps.

So I need 68020 emulation on the 68000 hardware. a Real CPU, you need to do some SIZ0, SIZ1, A0, A1 decoding to LDS/UDS and of course the E clock and VPA/VMA logic that has to be created so a 68020+ CPU can run on 68000 hardware. Thats all it takes to make a 68020 run in a 68000 board is those 2 things.

ZaneKaminski · Dec 14, 2016

techknight said:
68020+ accelerator/CPU for the Portable

Well maybe it wouldn't be too hard. There are still unsolved problems though. Let me explain.
The differences between the 68000 and 68020 in terms of the programmer's model are an extra 8 address bits, longword ALU operations, and the cache control registers. So all that needs done to add 68020 support is:

Either do another tree lookup or ignore top 8 address bits (as the Mac II's "Apple HMMU chip" does)
Longword ALU operations should be trivial once word-wide operations are implemented
Cache is irrelevant, just have to implement dummy cache control registers.

So it's not too hard to add 68020 support. 68030 is harder since it always comes with the MMU.

Now, what 68020 system to emulate? Either Mac II or LC. Mac II sounds better since it's older.

Anyway, the relevant differences between an SE and Mac II, in terms of chipset, are:

Different memory map, even in 24-bit mode
NuBus
No onboard video
Has 68881
Has Apple Sound Chip

These need to be translated into stuff on the Mac SE in order to emulate the Mac II with its hardware peripherals. Memory map, NuBus, and no onboard video can be resolved just by setting up the address space tree (in terms of what to cache and what to go over the bus to get). 68881 emulation shouldn't be too hard, since most operations can just be translated into equivalent ARMv7-M FPU operations. Apple Sound Chip emulation may be difficult, since the emulated output of the ASC has to be mapped to sounds that the 68000 Macs can produce through their more limited setup. Maybe it’s impossible and the accelerator should have a DAC and a little amplifier.

Sound aside, the conclusion is that it shouldn't be too hard to emulate a Mac II on top of SE hardware.

But for Portable? You need to cobble together a ROM that runs on a Macintosh II with a 68020, but supports the Power Manager bus communication through the VIA. Then it can work.

The other option is, if the PB140+ 68030 models use the same Power Manager chip as the Portable, to use a PB140/170 ROM and do a dummy 68030 MMU implementation.

Gorgonops said:
[Vampire accelerator] open-sourced here, perhaps there's something in there that might save you some work designing your own bus logic.

I'll check it out as I progress further. The TG68 core's bus interface implementation may be particularly helpful to me, since I have to implement basically the same thing. But I'm curious to see how they've implemented the Amiga peripherals. The Amiga has such a great design, with its blitter and the bitplanes and the two banks of RAM.
I've still gotta finish the schematic for this latest revision (with the microcontroller), and then I want to write the framework for the emulator and implement an instruction or two to see how good the performance can be. I'm also behind on my "real" product, this thing that adds Bluetooth A2DP audio to various Honda and Acura models with non-standard sized stereo head units. My finals are this week, otherwise there would be more progress.

techknight · Dec 14, 2016

Umm why do we need to emulate anything? Only thing that needs done IMHO is emulate the CPU. Maybe I missed something, but the machine is already there. The bus and motherboard does everything that needs to be done. the CPU just executes code the ROM feeds it.

Oh, and the Portable uses an ASC on-board with a pair of Sony chips. Just like the Mac II and the SE/30.

ZaneKaminski · Dec 14, 2016

techknight said:
Umm why do we need to emulate anything?

Well the problem I'm seeing is with the ROM. If you emulate the same CPU as in the machine, just faster, then you can use its ROM and everything works since all of the chipset hardware expected to exist by the software in ROM is present in the machine.
You could run the Portable from its stock ROM using an accelerated 68020+ CPU (real 68020 or emulator), but then it's unclear if 68020 software can be used, since the ROM and OS are unaware that the CPU has 68020 capability. So then you'd have to probably do some ROM patching, and that might be a neverending battle, trudging through tens of kbytes of disassembly to fix a neverending stream of error messages or something.

So the other solution is to run a Mac II ROM and re-map the address space in the emulator (or in the case of a physical processor, in the glue logic) to line up with the Mac II's address space. If the Portable has the ASC (I didn't realize it did), then that aspect should be easy, but then the Mac II's ROM is missing the software for the Power Manager chip, and the VIA GPIO bits will certainly be different, too, so those will need remapped or else the software in ROM, when using the VIA, will be manipulating the wrong signals.

ZaneKaminski · Dec 14, 2016

I've gotta figure out how the CFM works, just as an exercise in better understanding the OS. Are you sure it doesn't require an MMU though? It's functionality seems like it would benefit from the existence of an MMU

ZaneKaminski · Dec 14, 2016

The qSPI interface of the STM32F7 supports SDR operation up to 100 MHz and DDR operation up to 80 MHz. Specs for the -H7 are not available yet but they'll surely be the same or better. The -H7 is built on a new 40nm process. I think the -F7 is built at 65nm or something.

Now the qSPI clock should be synchronous to the Mac's clock, just several times faster. For all but the Portable and PB100, these machines run at 7.8336 MHz. So SDR operation maxes out at 94.0032 MHz (x12) and DDR operation at 78.336 MHz (x10).

Now, obviously 78.336 MHz DDR is the fastest option, but that would possibly require a 156.672 MHz internal clock in the iCE40 FPGA. Here's what the iCE40's datasheet says about its performance and latency:

So hopefully it's possible, since the qSPI interface shouldn't be very "deep" in terms of the longest path through the piece of work that has to be done on every clock edge. Also the iCE40 series has DDR registers in all of its I/O pins, so it can sort of take double the data and clock out in an alternating fashion. Hopefully that will halve the clock speed requirement.

The iCE40HX4K, which I plan to use, has two internal phase-locked loops, which can be used to multiply the input clock from the Mac. Hopefully I'll only use one, and so we can downgrade to the iCE40HX1K, which has only one PLL and a logic capacity of about 3x less.

The other benefit in multiplying the Macintosh's clock is that we can run the two completely in-phase. The BBU in the Macintosh SE, for example, is purposefully run out of phase with the 68000. The 68000's clock is generated from the BBU's clock, but with a 30ns delay. Therefore if the BBU places data on the bus according to a certain clock edge, the 68000 will always be delayed enough for the signals to stabilize before it samples them. With the fast qSPI clock synchronous to the Mac's clock, we don't need to do any of this delay business and instead can time an event to occur at one of 12 (at the most) points in the 68000's cycle.

So now that I'm doing it this way, I've gotta go back to designing the qSPI command set, which is what I started doing when I began this thread hahah.

ZaneKaminski · Dec 14, 2016

I'd rather do it with DDR since that's less demanding in terms of clock signal bandwidth. Y'know, in SDR, the clock oscillates at double the frequency of alternating (1010...) data bits. Whereas in DDR, the frequency of alternating data bits is the same as the clock frequency. But then again, I've never designed a state machine so complex, let alone with DDR.

About qSPI, it's mainly for interfacing with flash memory, but the qSPI command sequence is fully programmable on most chips supporting it, so it can be used as a simple, pretty fast FPGA interface.

The STM32F7 supports dual qSPI, where you hook both qSPI interfaces up to separate flash memory chips, and then they're accessed in parallel to double the flash memory throughout. I've determined it's not necessary for 68000 systems at 7.8336 MHz. Maybe Portable, at twice that speed, may need both interfaces. Surely the -H7 supports something similar.

Now, the Snapdragon doesn't have qSPI, so if that's ever used as a processor for 68020+ systems, none of this applies. But the maximum qSPI bandwidth (in dual configuration) on a IIfx is actually equal to the maximum throughput of its bus (80MHz DDR x 8 qSPI bits in dual configuration = 1280 Mbit/sec), but the qSPI interface has more overhead and has to send the command bits twice, once to each chip. So it's not quite up to full-speed operation on a IIfx. But on a IIci, I think it can manage 80% bus utilization or so. So that's adequate. But again, I dunno what processor I would use for those systems anymore.

ZaneKaminski · Dec 14, 2016

I think that we can avoid needing an INIT or special boot floppy or anything else like that by constantly hogging the bus, and never letting the 68000 get a chance to fetch an instruction. The Bus Glue FPGA, if it has nothing to do, will have to constantly issue meaningless reads to addresses known to be in the RAM or ROM.

Older hardware must have used latches and control signal translation logic (as thetechknight has described), so it wasn't really capable of performing such dummy bus operations. Maybe that's why early accelerators required the ROM board or a special floppy or something.

In addition to the bus operations, the FPGA could also be programmed to decode instructions (in the class-and-parameters scheme I described a few days ago), delivering the result in as little as 0.1 microseconds. I don't think it will be useful though. 0.1 microseconds is 40 clocks of the STM32H7, and I'm hoping to perform the instruction decoding faster than that. (It will be cached though, so it only has to be decoded once. This is the key enabler of high performance.)

Edit: also ST is apparently announcing higher-end members of the STM32H7 family, in Q1 of 2017. So I'll stay tuned to see if there's anything that can offer more performance. More internal memory and more L1 cache would be particularly helpful, but another 100MHz would be great too.

Edit again: The Bus Controller should have a queue of up to, say, 7 bus operations which have not yet completed. For 68000 systems, it should take less than one 7.8336 MHz cycle (1/4 of a bus cycle) to transmit the write byte/word command (including address, data) to the FPGA.

The way it would work is you would enqueue a read or write command and then check later (with a status command) to see if the operation was successful or not (and also read the data read, for read operations).

Each command can be assigned a sequence number from 0 to 7. The first command enqueued is numbered 0 and it goes from there, wrapping around to 0 after 7.

An asynchronous output of the FPGA should give the sequence number (as a gray code) of the command currently in progress (i.e. one after the number of the most recently finished.) That way, the STM32 can just check that output to know how many queued operations have been completed.

The idea is kind of like sliding-window flow control, if anyone is familiar.

techknight · Dec 15, 2016

most accelerators needed a ROM as a DeclROM to the OS, to tell the machine that it has X-Y-Z features, including the processor type.

Also, I have an accelerator board without a ROM for the plus, and the Plus sees it as a base 68020 without RAM expansion, or FPU. But it has an onboard FPU, and it is actually a 68030.

So the OS has its own way to tell whether the CPU is a 68020 or not. and its up to the DeclROM to tell the OS that its "really" an 030, and it has x-y-z features, and contains drivers for those x-y-z features, IF needed.

cb88 · Dec 15, 2016

The Problem with the ice40... is DDR probably wouldn't even fit in a 1k part, and would use up nearly 50% of a 4k part...that's and edducated guess based on the LUT figures for other lattice parts. SDR SDRAM fits in under 150LUTS for ice40 and they give you the code for that ... I'm not sure ice40 can meet the timing requirements for DDR.

Gorgonops said:
Just tossing this out there since I vaguely recall a target price for this thing in the one hundred dollar ballpark: the 68000 softcore in the MIST reconfigurable computer, which sells for about $200, is capable of speeds up to about 48mhz. (And seems to also have at least partial 68020 compatibility.) I have no idea how much just the FPGA in the MIST costs or if there might be a smaller/cheaper one that has sufficient capacity to run the softcore and some bus glue while dispensing with the capacity to emulate the rest of computer, but... can't help but make me wonder if the performance bar is this low and you're using FPGAs for bus glue anyway if an all-FPGA approach might be simpler and end up costing around the same.

I agree, if you are going to bother with an FPGA at all... you may as well go full bore. For instance. 15k LUT Artix-7 is 25$ and is very fast... especially for a 68000. http://dcd.pl/ipcore/101/d68000/<- that runs a 107Mhz on a kintex-7 and the artix-7 is just a bit slower, with a Spartan 6 hitting 79Mhz you can expect the Artix-7 to hit 90-100Mhz without much ado for a similar design. if you are going to beat an FPGA you need to execute around a single instruction every 15 cycles on the real CPU at 1.5Ghz. Basically the best you'll ever do without putting ton of work into the dynarec is parity if the FPGA .... and if you put more work into the FPGA to increase the IPC beyond 1 (the proprietary Appolo 060+ core does at least 2-4 IPC depending on the application). A dynarec on a 1.5Ghz ARM cpu is probably never going to beat a halfway decent FPGA design.

The Picorv32 runs at nearly 250Mhz in the slowest Artix-7 speed grade.... note that it only processes one instruction about every 3 cycles though. But that gives you an idea of the performance you could eek out of an Artix-7 based design.

As long as you stay with the same pinout you can upgrade to more LUTs as well... within the Xilinx Families.

The main drawback to modern FPGAs is you have to deal with BGA parts... but it is almost certainly well worth it. So, another thought is build a nice big fast core for the main mac hardware to be accelerated with and a slower compact slave core (2-3000 LUTS like the J68K core... and acutally there may be 1000LUTS to be saved on the J68K I think it has a bunch of probably unneeded endianness swapping garbage in it) that will run 68k Linux.... to do all the fiddly bits with wifi/USB/VGA. While anything you come up with will undoubtedly be cool... that would twiddle all the bits just right It think for most enthusiasts

.

ZaneKaminski · Dec 17, 2016

cb88 said:
The Problem with the ice40... is DDR probably wouldn't even fit in a 1k part, and would use up nearly 50% of a 4k part...that's and edducated guess based on the LUT figures for other lattice parts. SDR SDRAM fits in under 150LUTS for ice40 and they give you the code for that ... I'm not sure ice40 can meet the timing requirements for DDR.

What do you mean? I meant DDR qSPI (to talk to the MCU), not DDR SDRAM. I didn't plan to connect any external RAM to the iCE40. Or do you really think I can't fit a DDR qSPI in interface with 1000 LUT4s? I have to defer to others' expertise on this FPGA state machine stuff, but I think 1000 should be plenty.

cb88 said:
I agree, if you are going to bother with an FPGA at all... you may as well go full bore. ... 15k LUT Artix-7 is 25$ and is very fast ... http://dcd.pl/ipcore/101/d68000/ ... can expect the Artix-7 to hit 90-100Mhz.

It isn't as cheap as it seems though, and I think the performance possible with such a 100 MHz-capable 68000 implementation is not necessarily better than what can be obtained under emulation on a very fast microcontroller (like the STM32H7) or especially not a GHz-speed "application processor."

Firstly, even if this core can run at 100 MHz in a cheap FPGA, their site advertises bus cycle timing identical to 68000, so it must execute one instruction in four cycles at the most, so 25 MIPS tops. TG68, the other popular core, advertises faster execution for some instructions (how?), but it's still constrained by the 16-bit bus.

To match 25 MIPS for register-register ops, the STM32H7 at 400 MHz would have to complete one instruction in 16 of its cycles. With the decoding of the instruction word cached, I think executing a register-register instruction in 16 cycles slightly out of reach, but not too far. I think it's certainly doable in 32 cycles.

However, for instructions with multiple extension words, I think that we can handily beat a 68000 at 100 MHz. The time-consuming part of executing the instructions is decoding them, but that can be cached, and then the second longest part is jumping into the routine to service that type of instruction (misprediction is probably likely, and that imposes a penalty equal to the pipeline length). Instructions with more extension words will naturally be executed faster under emulation, since the STM32H7's SDRAM is much faster than the 68000's memory interface.

And then there's the cost. If I have just 25 units produced, it's another $5ish per unit for BGA assembly. The PCB needs to be a lot denser, too, and then the PCB may need to have 6 layers, which is not nearly as cheap as a 4-layer PCB in these kinds of quantities. Then there's the question of external RAM. I was gonna use SDR SDRAM connected to the STM32H7, since it's pretty easy to route and it doesn't require extensive simulation work to get right. But these FPGAs only have hard controllers for DDR2/3, so either get DDR2/3, which is quite a bit more routing effort, or do a soft SDRAM controller, but that would introduce a huge bottleneck unless I do a 64- or 128-bit wide.

There was a Cyclone IV E in QFP-144 package for $12 that interested me, but I understand the Cyclone IV to be a bit slower than the Virtex-7, right?

cb88 said:
halfway decent FPGA design ... The Picorv32 runs at nearly 250Mhz in the slowest Artix-7 speed grade.... note that it only processes one instruction about every 3 cycles though. But that gives you an idea of the performance you could eek out of an Artix-7 based design.

I could try and do an MC68000 implementation with data and instruction caches, forwarding, maybe even branch prediction, etc. to try and get close to single-cycle execution (or even do multiple-issue as you have suggested), but my interest right now is really in emulation... my aim in my career is to work in software development, not hardware stuff, so I want to get experience in that area. Regardless, for this cost and performance target, I think that's the right way to structure the system, as I have said above.

ZaneKaminski · Dec 17, 2016

techknight said:
most accelerators needed a ROM as a DeclROM to the OS, to tell the machine that it has X-Y-Z features, including the processor type.

Also, I have an accelerator board without a ROM for the plus, and the Plus sees it as a base 68020 without RAM expansion, or FPU. But it has an onboard FPU, and it is actually a 68030.

So the OS has its own way to tell whether the CPU is a 68020 or not. and its up to the DeclROM to tell the OS that its "really" an 030, and it has x-y-z features, and contains drivers for those x-y-z features, IF needed.

That's helpful. I'll look into it, but I've gotta get some official Apple references on this stuff... I still can't find the Volumes IV-VI of Inside Macintosh. Maybe it's in there.

ZaneKaminski · Dec 17, 2016

techknight said:
Plus sees it as a base 68020 without RAM expansion, or FPU. But it has an onboard FPU, and it is actually a 68030.

Does CFM68k work on this machine?

Serious proposal: accelerator and peripheral expansion system

NIGHT STALKER

Moderator

Well-known member

NIGHT STALKER

Well-known member

Well-known member

Moderator

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Similar threads