So how did I get this to work after all?
As said earlier we found out that the Carrera would load succesfully when the 16MHz bus clock signal gets shifted 180° before being sent to the accelerator. Everything we could test besides onboard video did work fine that way.
When looking for the cause of the onboard video issues we poked at a lot of different things without much success.
@GeekDot disassembled the SE/30 video ROM, dug through the Turbo 040 ROM in search for hints if they patched anything SE/30 related in there and spent countless hours staring at MacsBug :tongue:
Me being useless for this kind of stuff kept poking around with the logic analyser hoping to find any clues on what's going on.
I hooked up some probes to the VRAM to easily find video write cycles and looked at the data bus on the logicboard (starting at D24-31 which connect to the VRAM) hoping to find that we were just experiencing some bad timing on the output drivers of the C040 which caused that no valid data was present on the bus when it should have been there. That was not the case though. Hold times for the data lines and all VRAM related write signals were good. Instead the data outputs have just been turned off for 1 out of 4 bytes that were written to the VRAM.
I went ahead and soldered probe sockets to data pins D24-31 right on the 68040 on the C040 to see what's going on there. No big surprise the data seemed to be there, so for some reason it does not make it's way through the output buffers to the logicboard bus but only for always the same single byte out of the four bytes that get written with each access. The driver is just dumping video data onto the screen base address in longwords. Those longwords get split up into 4 separate bytes when written to the 8 bit wide VRAM.
We took a step back to think about this. The 68040 also supports dynamic bus sizing just like the 68030 but the output byte lanes differ between those two depending on the state of SIZ0-1 and A0-1.
The Motorola design handbook also mentions this in the section about running a 040 on a 030 bus. The proposed design to implement a circuit to shift data around between the bytelanes using muxes does allow for most combinations to work but hints that a full mesh of all bytelanes was considered too complex regarding part count.
On the Carrera the data buffering and bytelane swapping is all done in the two FPGAs so we couldn't easily observe what exactly the logic is doing.
What we could see though is that on VRAM writes all signals named above (SIZ0-1 and A0-1) are cycled through all possible combinations that make sense for byte-wise writing a longword (just as you would expect)
With the bytelanes being spread over both FPGAs I would guess that they also did not implement the possibility to swap all bytelanes around between each other just like the circuit in the Motorola design book.
For one combination of the bus sizing signals the bytelane that's actually connected to the VRAM stays silent on the outputs of the FPGA.
I went on to check what's going on on the other bytelanes. On the LA I could see that there is data present as well on all other data lines even though those are not connected to the VRAM.
To investigate if the data on the other lanes is actually video data that might be useful I desoldered the VRAM from one of my logicboards and installed it again usign a small interposer board so I could route the other bytelanes to the VRAM to see what is going on on them and if any of them produces a useful picture.
Turned out the other lanes contain image data but it is mostly useless:
However when looking at the output of one byteline something interesting caught my attention:
Compare that to the image containing the missing bytes:
As you can see the missing data on the VRAM bytelane (D24-31) is present on another bytelane that otherwise has data that is useless.
To get this straight I cut through some data lines on my adapter and wired in two buffers. One just switches D24-31 back and forth on reads and writes, the other one puts data from D16-23 to D24-31 whenever there is an access to the VRAM location that contains the white stripes.
The address decoding and control logic to do this is implemented in a single GAL. On all other bus accesses the second buffer stays turned off and data is pushed around normally on D24-31.
I modified the decoder to support PDS cards and testing looks good so far that all cards I have here seem to work just fine.