• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

SE/30 DiiMO accelerator cloning

Phipli

Well-known member
260MB in a 68K mac? I'd believe that would take forever!

So, it doesn't do a test on reset or clean restart, but the specific Special->Shutdown then clicking restart in the "It is now safe to turn off..." dialog seems to do one. Very much an edge case...

This leads into a follow up question for someone that knows the Mac OS boot process in more detail - is there something about Mode32 or the SE/30 specifically that causes it to do a memory test at the happy mac screen? I always thought it happened earlier in the boot process.
Mode32 sort of double boots on a cold start, but not warm... I believe. The RAM test doesn’t do all available RAM if you boot in 24bit, so you might have a RAM issue higher up the address space, meaning the initial RAM test passes (8MB only), but if there are ROM patches loaded for 32bit, it fails???

The above is 90% guesswork, and I'm going to childishly follow up with my answer to all MODE32 questions...

@cheesestraws!
 

zigzagjoe

Well-known member
Mode32 sort of double boots on a cold start, but not warm... I believe. The RAM test doesn’t do all available RAM if you boot in 24bit, so you might have a RAM issue higher up the address space, meaning the initial RAM test passes (8MB only), but if there are ROM patches loaded for 32bit, it fails???

The above is 90% guesswork, and I'm going to childishly follow up with my answer to all MODE32 questions...

@cheesestraws!
My issue is an edge case related to this accelerator - RAM is fine without it or fine with accel on a cold boot. Which is good, because I'd be really sad if one of these vintage modules dies on me, they came in my favorite SE/30 when I got it nearly 20 years ago.

I'm A-OK attributing that to the clock shenanigans I've got going on here. This thing *should not work* right now. So, idle curiosity on the boot process. Your guesses line up exactly with my thoughts too. Only tests a little RAM, and MODE32 does something that causes it to test the full at happy mac.
 

Phipli

Well-known member
My issue is an edge case related to this accelerator - RAM is fine without it or fine with accel on a cold boot. Which is good, because I'd be really sad if one of these vintage modules dies on me, they came in my favorite SE/30 when I got it nearly 20 years ago.

I'm A-OK attributing that to the clock shenanigans I've got going on here. This thing *should not work* right now. So, idle curiosity on the boot process. Your guesses line up exactly with my thoughts too. Only tests a little RAM, and MODE32 does something that causes it to test the full at happy mac.
If you can get it booted in the right circumstance (try System 6 if you haven't, sometimes it is... less picky to a fault) Mac Test Pro might help. It will run a lot of the hardware tests with written feedback.

Download #1 here :

 

Phipli

Well-known member
Although you will need Optima instead of Mode32

 

zigzagjoe

Well-known member
Although you will need Optima instead of Mode32

I didn't realize there was even an option to go 32 bit in system 6. Neat, I wonder what you could use that'd make use of it though.

I'm going to wait for those GALs to show up, if this continues to be an issue once I get the rest of the board to a functional baseline then I'll dig further. It would be interesting to understand the native RAM test however since it's the only one with any issues (and again, just on warm reboots w/ full RAM test and the known weird experimental accelerator).
 

Phipli

Well-known member
I didn't realize there was even an option to go 32 bit in system 6. Neat, I wonder what you could use that'd make use of it though.
A fair few things, software that was designed with 7 in mind. Its handy if you need the RAM. Things like MacroMind Director I bet, any audio recording software.

There is a chance I remember it doesn’t work with MultiFinder, so you get almost all of it too!
 

Crutch

Well-known member
Yeah, Optima doesn’t work with MultiFinder. So it’s really only useful if you need to run one single giant application - or want a huge RAM disk.
 

max1zzz

Well-known member
In regards to the clock issue, have you got termination resistors at the end of the clock trace coming out of U2? the stock design dosen't have them (it only has them on the main 50mhz clock out the oscillator) but it was one of the recommendations Bolle gave me when I was working on the LCIII version

Should be a 330 ohm to 5v and a 220 ohm to gnd
 

zigzagjoe

Well-known member
In regards to the clock issue, have you got termination resistors at the end of the clock trace coming out of U2? the stock design dosen't have them (it only has them on the main 50mhz clock out the oscillator) but it was one of the recommendations Bolle gave me when I was working on the LCIII version

Should be a 330 ohm to 5v and a 220 ohm to gnd
That was an early theory of mine too, but the issue is looking to be a problem with the ATFs. I bodged a 1K pot onto the furthest GAL to act as termination. It doesn't make a functional difference to the issue, but it does make it easier to keep it in the "sweet spot" where the phase shift ends up about right combined with wherever in the undefined region (between logic low and logic high) that a high registers.

I'm using a meter to measure average voltage as well and adjustable threshold on my logic analyzer to confirm and measure peak.

Taking the board out of the equation, just using a ATF by itself with power, bypass cap, clock in - the output signal is still wonky. Unfortunately, I don't have access to a scope with resolution sufficient to fully investigate what's actually happening here, so I'm going to just assume it's an ATF quirk as a second ATF had the same behavior. The voltages measured improves if I lower the frequencies. It's also weird that the average voltage of the signal through ATF with one inversion is different from "direct" pass through the ATF's array.

I'll do the same tests with the GALs when they get here tomorrow and see what I get.

It does make me wonder about the design history of these boards, between the decision to halve the FPU clock and the set of termination resistors (exactly as you wrote) seemingly not connected to anything. Did the LCIII board do the same thing with the FPU clock? I implemented the FPU clock as in the original, but with a jumper to run the FPU at full speed too. I've been running it at full speed and see the expected performance improvement that matches Daystar Powercaches at 50mhz. I haven't been able to come up with a good reason they'd want to halve it.
 

zigzagjoe

Well-known member
Good news! As I thought, it was a problem with the ATFs. Replaced U1 and U2 with Lattice GAL16V8B-7LJ and we are in business!! Clock signal out from the U2 GAL measure a healthy average of 2.4v. No change in perf, though my RAM test issue remains.

Ignore the weird duty cycle below, the threshold is a little bit low, so the duty is off.
1686603168094.png

To summarize, as Bolle said, ATFs are weird....

For troubleshooting the RAM test, I replaced U5 and U10 also as they use the U2 clock. No change in perf, RAM issue remains. Popped the original CPU in - same issue . So I'm eating my words @Phipli, it's a LB or RAM issue, but that's great news that it isn't a problem with the accelerator. RAM does the same thing in two boards, and set of 4 individually test fine... I think the SE/30 is just slightly allergic to these particular vintage modules. GURU's ram test inside of Mac OS is fine too. I'll have to keep fiddling with it, or just ignore it, but I'm not gonna drag this thread further about it.

So the logic.... The state machine(s) are terrifying. As they say, when you look into the abyss, the abyss stares back. I'm going through and attempting to figure out what does what, but at every turn there's a new twist.

That being said the logic doesn't seem particularly sensitive, other than to U1 and U2. I'm thinking the phase shift clock is so as to get a lead on some of the 68030 signals that change mid-clock. That clock should probably stay external in the interest of simplicity. For the rest of it, if one wanted to try to avoid untangling the black magic voodoo stuff, I think it could probably just be stuffed into a suitable CPLD or small FPGA as discussed earlier in the thread.

The XC9536XL could work as it's 5v tolerant, though I think the 7.5ns speed grade would be better as there's a definite trend towards all 7ns GALs. Some math should be done to see how many GALs can consolidate into one CPLD with savings due to lack of duplicate pins - the XC9536XL had 34/36 IO pins dependent on package and I think it'd be best to reduce the # of GALs by as much as possible. The equations that just check the status of many address bits like N$28 N$11 N$34 should probably stay in a GAL (or ATF) to save pins.
 

Phipli

Well-known member
GURU's ram test inside of Mac OS is fine too. I'll have to keep fiddling with it, or just ignore it, but I'm not gonna drag this thread further about it.
Grab MTP I linked before, it is an Apple tool and lets you cycle the test.

It also tests a load of other sub systems, so is worth having to hand.
 

zigzagjoe

Well-known member
I think I got to the bottom of the U7 mystery. The original file @Bolle dumped has bit 2124 set, which is the AC1 bit for Macrocell associated with Pin 15. This signal is /DS.LOCAL. U7 doesn't actually have any code to do anything with this output. So my assumption is with this bit set and the others unprogrammed, the output ends up always tristated with no signals selected in its matrix. JED2EQN doesn't know what to do with this, doesn't generate anything for it, and when running EQN2JED on the output from JED2EQN, the pin ends up as an output (not-tristated).

This is easy to fix by either adding some code to turn it into a feedback, just turning it to a NC in the schematic, or just using the original dump.

Still fiddling around with the overall logic. I think trying to fully understand it is probably going to just give me a headache, but it is pretty straightforward to do some permutations to make it easier and give some info for sizing purposes. I've attached a file with all equations compiled in a file and the assorted chaff from the decompile process cleaned up. Not an actual file that can be loaded on something, but a easier to work through the logic since it's not split across 12 files.

Analysis gives the following stats (detail attached). The logic needs a bit of further cleanup like guaranteeing uniqueness of outputs (I_HALT_LOCAL was possibly driven by two different GALs) but this is useful for sizing CPLD/FPGAs.

input only (40)
Feedbacks (50) - used both internally and externally
internal feedbacks (13) - only used internally
Output only (11) - only used externally
OEs in use (11) - requires tristate support
registered outputs/feedback (39)
combintorial outputs/feedback (34)

Some of these outputs - the 3 phase shift/invert clocks, and /STERM_PDS - should probably be left in a GAL for simplicity's sake since they exploit the propagation delays of GALs. The rest I don't think are as tightly timed (or use registered, if they are) and I'd think a propogation delay in the 7-10ns window would be just fine. Bigger question would be if CPLD/FPGA could support groups of registered outputs with different clock signals, and tristates... I think so, but I've only ever worked with GALs.

An interesting tidbit, the logic watches for access to SCC, SWIM, or Sound addresses with signal N__11 and then.... does something scary with that in U4. I'm assuming it's slowing things down, somehow, for compatibility, but it touches many of the state machine signals so hard to say quickly.
 

Attachments

  • Diimo030.txt
    11 KB · Views: 1
  • analysis.txt
    2.9 KB · Views: 1

zigzagjoe

Well-known member
Forgot that combinatorial outputs can coexist with registered outputs. This simplifies the clocked section a bit and changes counts to below.

combinatorial = 49 (7 of these should probably be in GAL)
registered = 24

Unfortunately, I just realized I introduced an fairly big error that is going to be annoying to fix..... sets of equations from 1 GAL with active low output will have an incorrect input polarity where that signal is used as an input to equations that came from another GAL (as it would not be aware of the output polarity. ). Output polarities will only be consistent in equations that originated from the same GAL, and equations from other GAL will be written as if the signal is active high.

Easy enough to fix - invert all references to that signal used in equations from other GAL - but it will be very annoying to manually fix now as I've changed the order of the signals to some degree. Or maybe it's a nonissue depending on how the order of operations works out for feedbacks...

I can't quite figure out the order of operations on the inversions, so I'm probably going to have to test.... case like below.

Code:
 /Pin1 = GND
 ; result: PIN1 at output will be a logic high, unless inverted in pin definitions? (never the case for these JED2EQN files except for 11)
 /Pin2 = /Pin1
 ; not sure. logically we can ditch the inversions, but I'm unclear on if it's already inverted when it is fed back and we are inverting it again

I think I've just confused myself by focusing too hard on this while also writing code...Inversions/active lows on the outputs are absolutely killing me.
 

Attachments

  • Diimo030.txt
    11.1 KB · Views: 1

zigzagjoe

Well-known member
Regarding the above... feedbacks seem to occur prior to active low inversion. So I don't think any correction is needed.

Reinforcing the idea that the logic is quite tolerant to propagation delays, I played around with overclocking my Diimo clone to see where the limits were. Cache gets flaky around 59 mhz and fails at 60mhz. Without the cache, the rest of the board can manage 63mhz before it starts getting flaky. Incidentally, it also boots with cache down to 20mhz, possibly lower, but I didn't investigate that closely.

Speed ratings...
Data SRAM 12ns
TAG SRAM 15ns
GALs 7ns GAL16V8B
U24 10ns ATF16V8B
MC68030RC50C
XC68882RC50A

An interesting tidbit: 12 ATFs use about 800ma @ 5v while powered outside of the mac, 11 GALS + 1 ATF is up to about 1.1A.
 

zigzagjoe

Well-known member
In case anyone is still tuned in, I've been able to pretty conclusively verify that the speed ratings on any FPGA/CPLD (or even GAL) substitute is not terribly sensitive as the bulk of the logic is flexible*. I've tried just about every possible combination of 5, 7, and 10ns GALs/ATFs and have not succeeded in disrupting operation at 50mhz. I did install 10 ns data and tag SRAMs, though I am not sure that was actually required.

* phase shift clocks and STERM delay logic should remain in a GAL. These ones are sensitive.

I've managed to achieve 60.5mhz with functional cache, and so far it is passing stability testing @ 60. A tiny heatsink was required on the CPU when testing in open air, though I think it'd be fine actually in the SE/30. With one further replacement I'm hoping to increase the headroom a smidge more so I can feel happy with 60mhz long term.

Misc notes:
  • Right now 189 (non-unique) signal pins are required across GALs. Fully consolidated to one FPGA, this reduces down to around 100 I/Os required. Should be possible to find wider SRAM, wider buffers, and more available wider tag SRAM to reduce complexity further.
  • Got some (legit) GAL16V8D-7JU that programmed and verified successfully at 16v programming voltage, but did not actually work. 17v was required for reliable operation.
  • I have a lot of respect for whoever designed this thing. They have dealt with all the edge cases in hardware, including some likely to never be seen on a SE/30, where other accelerators use code and/or ROMs to address issues such as floppy and audio timing. This is probably why A/UX works on it.
  • All 5ns GALs can operate cachelessly at 65mhz, but cache limit stays at 59 mhz.
  • Still not sure why Micromac halved FPU speed in the original. My 50mhz FPU has been fine at full CPU speed, and does not seem to top out at any point. Also seems to benefits from cache (*may be the app code benefiting).
  • Performance vs frequency data
  • PLCC sockets are not my friends.
 

zigzagjoe

Well-known member
I ended up leaving at 55mhz in the interest of not wanting to have to tinker with it long-term.

Attached are two quick extensions I wrote (with a pointer from @cheesestraws - thanks!) to activate the cache earlier in the boot process. Technically, the control panel is not even needed with these, unless you want SANE patches too. Note these extensions have no way to disable the cache other than disabling extensions, though the Diimo control panel can still be used to turn cache back off either during boot process or after the machine has booted.

EarlyDiimo turns on the cache at as the earliest INIT to load, and EarliestDiimo turns it on even sooner using the 'scri' file type. Use one or the other, no need for both.

No early cache: 57 seconds to desktop
Early cache by INIT: 53 seconds
Early cache by scri: 47 seconds
 

Attachments

  • EarlyDiimo.sit.hqx
    2.3 KB · Views: 0

Phipli

Well-known member
I ended up leaving at 55mhz in the interest of not wanting to have to tinker with it long-term.

Attached are two quick extensions I wrote (with a pointer from @cheesestraws - thanks!) to activate the cache earlier in the boot process. Technically, the control panel is not even needed with these, unless you want SANE patches too. Note these extensions have no way to disable the cache other than disabling extensions, though the Diimo control panel can still be used to turn cache back off either during boot process or after the machine has booted.

EarlyDiimo turns on the cache at as the earliest INIT to load, and EarliestDiimo turns it on even sooner using the 'scri' file type. Use one or the other, no need for both.

No early cache: 57 seconds to desktop
Early cache by INIT: 53 seconds
Early cache by scri: 47 seconds
Are you making it enable early by switching it on with an extension instead of the control panel?

Does this give any improvement over moving the Control Panel to the Extensions folder (and putting an Alias of the control panel in the Control Panel folder for convenience)?
 

zigzagjoe

Well-known member
Are you making it enable early by switching it on with an extension instead of the control panel?

Does this give any improvement over moving the Control Panel to the Extensions folder (and putting an Alias of the control panel in the Control Panel folder for convenience)?
The EarlyDiimo extension is functionally identical to what changing the control panel to an extension should do. I say should, as that approach did not work in my testing. The control panel stores the preferences in the control panel itself, I suspect that might have been a problem. That said, I might have made a stupid mistake.... I didn't spend a lot of time testing playing with the control panel.

EarliestDiimo lives in the extensions folder but is a 'scri' file instead. This loads much earlier in the boot process - thanks cheesy - as it is intended to load localization stuff, but it works just fine for this too. Enabling the diimo is a simple matter of setting a bit, doing it 110% correct adds a little handling of MMU mode and flushing caches.

Source is in the .sit, though the change to the 'scri' was done manually. Below is the operative piece, this is the combination of two functions from the control panel.

Code:
        moveq    #1,d0                ; clear inst cache
        _HWPriv    
        moveq    #3,d0                ; clear data cache
        _HWPriv
        
        moveq    #1,d0
        _SwapMMUMode                ; go 32bit
        move.b    ($50F01F03).l,d1    ; read from via
        ori.b    #1,d1                ; set bit for diimo cache
        move.b    d1,($50F01F03).l    ; write back
        _SwapMMUMode                ; reset mmu
        RTS
 

Phipli

Well-known member
The EarlyDiimo extension is functionally identical to what changing the control panel to an extension should do. I say should, as that approach did not work in my testing. The control panel stores the preferences in the control panel itself, I suspect that might have been a problem. That said, I might have made a stupid mistake.... I didn't spend a lot of time testing playing with the control panel.

EarliestDiimo lives in the extensions folder but is a 'scri' file instead. This loads much earlier in the boot process - thanks cheesy - as it is intended to load localization stuff, but it works just fine for this too. Enabling the diimo is a simple matter of setting a bit, doing it 110% correct adds a little handling of MMU mode and flushing caches.

Source is in the .sit, though the change to the 'scri' was done manually. Below is the operative piece, this is the combination of two functions from the control panel.

Code:
        moveq    #1,d0                ; clear inst cache
        _HWPriv   
        moveq    #3,d0                ; clear data cache
        _HWPriv
       
        moveq    #1,d0
        _SwapMMUMode                ; go 32bit
        move.b    ($50F01F03).l,d1    ; read from via
        ori.b    #1,d1                ; set bit for diimo cache
        move.b    d1,($50F01F03).l    ; write back
        _SwapMMUMode                ; reset mmu
        RTS
I didn't mean changing it into an extension, just literally putting the Control Panel, unmodified, in the Extensions folder.

It's what I do to get my accelerator to enable earlier in my SE
 
Top