I have a working board with the RAM expansion. If you decide to base something on this design and run into a roadblock where having access to a working board would help, and you are in the US, let me know and I would be willing to loan it out for nondestructive reverse engineering.
This 32-bit RAM expansion module provides up to 4MB of 80ns RAM with zero wait states. Interestingly, the Total Systems Accelerator family brochure claims that the Mercury 030, when paired with this RAM expansion board, can match the performance of a stock SE/30. Adding the RAM expansion module could roughly double the Performer board performance!!.
I've spent some time analyzing the logic of the MicroMac Performer and would like to share some key takeaways from my findings. Assuming the MicroMac Performer uses the same logic as the Mercury 030, the RAM expansion module replace the onboard RAM with a faster one with zero wait states and dedicated 32-bit data path for the 68030.
All my analysis is based on the Macintosh Plus architecture and ROMs. I presume that this should also be valid for the SE and the Classic.
The CIIN Signal
On the MC68030 CPU, CIIN is used to selectively inhibit the on-chip cache on a per-bus-cycle basis. This provides dynamic, cycle-by-cycle control over caching.
The PLD logic for the CIIN signal is key. The equation for CIIN becomes active when the address on the bus is $400000 or higher.
o16 = (A22 AND A20) OR (A22 AND A21) OR (A23)
This logic ensures that for all addresses from $000000 to $3FFFFF, the CIIN signal remains inactive. This range represents a contiguous block of 4MB. This entire 4MB region is designated as the high-speed, cacheable memory space.
The address $400000 is the exact starting address of the Macintosh Plus's built-in ROM. Everything from this address onward in the memory map is dedicated to the ROM and the memory-mapped I/O devices of the original hardware.
The Mercury's designers chose this 4MB boundary for a very simple and elegant reason: below 4MB all memory is RAM. It is safe and beneficial to have this region be high-speed and cacheable.
At or Above 4MB: All memory is either the ROM or a hardware device. These regions are inherently slow and should never be cached to prevent data coherency issues.
By setting the CIIN boundary at the 4MB mark, the accelerator's hardware automatically and flawlessly switches between the two memory types, providing maximum performance for the fast RAM while maintaining perfect compatibility with the slower, original Macintosh Plus hardware.
The CDIS Signal
CDIS is a dedicated input pin on the MC68030 CPU that provides a hardware mechanism for permanently disabling the on-chip caches.
The CDIS signal on the Performer accelerator is a dynamically controlled signal that is actively managed by the PLD state machine.
rf14.D = i6 & !rf14 # i5 & !rf14 # !i8;
where rf14 is CDIS, i6 is FPUCS (FPU Chip Select), i5 is AS_30 (68030 Address Strobe), and i8 is RESET_30.
This is a D-type flip-flop equation for a state machine, which means the state of CDIS (rf14) is determined by the previous state and the current inputs.
The active control of CDIS seems to be part of a fine-grained, dynamic cache management strategy for the 68030 CPU.
The logic shows CDIS is asserted by FPUCS and AS_30.
The PLD's control over CDIS likely exists to handle very specific, real-time events that could cause cache coherency issues. For example:
FPU Access (FPUCS): The FPU is a separate processor that could be reading or writing to the same memory locations as the 68030 CPU's cache. To prevent a stale cache, the accelerator could momentarily disable the cache with CDIS during an FPU access to force a cache flush or a fresh read from main memory.
Bus Cycle Control (AS_30): The logic also ties CDIS to AS_30 (Address Strobe). This could be part of a complex sequence to ensure cache integrity during certain bus cycles, especially when transitioning between fast and slow memory regions.
The responsibility for hiding the RAM expansion during boot and for managing cache coherency falls on a combination of signals:
CDIS: The CDIS signal is a dynamic, on-the-fly control for specific, time-sensitive events (like FPU access) that could affect the cache.
CIIN: The CIIN signal is the primary mechanism for managing the memory map, telling the 68030 not to cache the slow, uncacheable regions (ROM and I/O).
The software driver is responsible for configuring the MMU's cache policy (write-through for the RAM region) and for de-asserting the CDIS signal to bring the RAM online after boot.
The DTACK_SYS+EX Signal (PLD U3)
The DTACK_SYS+EX signal is the combined DTACK signal that the 68030 CPU receives to confirm a data transfer is complete.
The equation for this output is: o15 = !i9 & i11 # i13 & !i9;
o15 is the DTACK_SYS+EX signal.
i9 is the DTACK signal coming from the Macintosh Plus logic board's TSM IC. This is a DTACK signal generated with wait states because the original Mac Plus bus is much slower than the 68030.
i11 is an input labeled P3.B10. On the MicroMac Performer, this is pulled high, but on the Mercury 030, I presume this signal represents the zero-wait-state DTACK from the Mercury 030's RAM expansion board.
i13 is an input labeled EXT_DTK. It's pulled-up and not used by the Performer board. This suggests it's an external DTACK input for an optional expansion not included in the standard design.
The logic works by interpreting the equation as a multiplexer or logic gate that selects the correct DTACK source:
DTACK_SYS+EX = (!DTACK_MAC_BUS AND DTACK_HIGH_SPEED) OR (DTACK_EXTERNAL AND !DTACK_MAC_BUS)
Essentially, the Performer board uses this logic to choose between a fast DTACK and a slow DTACK based on which memory region is being accessed.
High-Speed, Zero-Wait-State Access: When the 68030 accesses the "FAST RAM" (the first 4MB) on the expansion board, the CIIN signal is inactive. The logic on the expansion board recognizes this and responds with a fast DTACK. This signal is then fed into the Performer board as input i11 (P3.B10). The logic on U3 sees this and immediately passes it to the 68030 as DTACK_SYS+EX, completing the bus cycle with zero wait states.
Legacy, Wait-State Access: When the 68030 accesses a memory region on the original Mac Plus logic board (like the ROM or I/O ports), the CIIN signal is active. The DTACK_SYS+EX logic on the Performer board will ignore the high-speed DTACK (i11) and wait for the slower DTACK signal from the Mac Plus bus itself (i9). The slower Mac Plus bus provides a delayed DTACK, which the accelerator passes to the 68030, adding the necessary wait states.
The MMU
The MMU is enabled by hardware. The MMU's configuration, specifically the setup of the write-through cache policy, is handled by the Performer's software driver.
Here's a step-by-step model of how the system most likely works:
Boot-up (Hardware Only): The Macintosh Plus powers on. The Performer accelerator takes control and starts the 68030. Since the MMUDIS pin is tied high, the MMU is enabled, but its configuration is in a default, undefined state. Because CDIS is active during boot, the RAM expansion is not accessed, and this lack of MMU configuration doesn't cause any issues.
The system boots using the native Mac bus (onboard RAM).
Once the OS is running, the Performer's INIT loads into memory.
The driver writes to the 68030's MMU registers. It will define the 4 MB of RAM (which contains the video and sound buffers) as a transparent segment with a "cacheable, write-through" policy by writing specific values to the MMU's Transparent Translation Registers (TTRs).
After the driver has configured the MMU, the 68030 can access the Mac's memory. When it writes to the video or sound buffers, the MMU automatically enforces the write-through policy, ensuring that the data is immediately written to main memory and remains coherent with the video and sound DMA controllers.
If all of the above is on the right track, it is less complex to come up with a custom memory expansion board using SRAM. All additional logic would be just for the SRAM controller, and a PLD like the G22V10 would probably be sufficient.
Another key point is how the Performer switches between the 16-bit data bus and the 32-bit data bus (handling of DSACK0 and DSACK1 for the fast RAM module). This logic is not present in the Performer board and could also fit into the G22V10. I'm working on it now, but it's not ready to throw out into the wild yet