SuperMac Spectrum 24 PDQ+ Artifacts on Display

jmacz

Well-known member
I can definitely trigger from normal execution now. Just the wrong bits were the least significant so visual inspection caused me to miss it earlier. Now that my program is actually testing the full value at every pixel, I'm seeing that it's easy to reproduce and has nothing to do with XOR, the cursor, etc. Seems to happen if the lower bits (bits 2, 3, and 4, in the blue byte) change quickly on every 4th pixel column. Pulling out the logic probe now.
 

MacOSMonkey

Well-known member
Great job on the detailed debugging! Lots of information!

Some suggestions/ideas:
1. You can try writing all $a or $5 as the data (max transitions) - that might help when scoping. Alternate the pattern on each successive write. (non-random worst-case data pattern)
2. It could be noise -- where you are having an edge transition problem. Maybe a bad RAM or discrete (bad termination or bypassing). Easy to spot on the scope. Transitions might be injecting noise (bad bypass, etc.).
3. There could be a short - data-data or data-address, but not address-address (from your testing). Depends in the layout of the parts...and it could also be anywhere the traces go. It could be a debris or crud short, solder hair from rework...whatever.
4. Something could be coupling somewhere - maybe secondary to a short...but the alternating data is the best clue so far.
5. Maybe a power/ground issue - (re)check all the RAM to make sure they are good. Always the best starting point and sanity check.

I know you have previously checked discretes, trace routes, nubus, etc. After checking all the grounds, definitely look at the address and data lines again...but systematically on a scope. In your program, use a known failure address and go through and toggle each address and data line repeatedly while watching on the scope for failures or adjacent glitching. If you can cause the problem at a single address just by flipping data bits, then that will save time. And once you are triggering on the 100% hardware case (now that you can mostly isolate it in software), it should hopefully be solvable.

Logic would seem to dictate that all the RAM mapped to blue in direct mode can't be bad. And if you can cause the problem anywhere, then it may be in the path to the RAM, rather than the RAM itself (unless there is some internal device issue on a multiplexed address or data line).

Also, since you know which bits tend to fail, you can focus on one of those bits as a debug source/trigger. For example, if you are checking bit 2 failures, write the data as $0000000a and $00000000 and see if bit 2 flips. Or, write the bit directly with an alternating pattern $0 and $4 and look for failures.

Thete are a lot of ways to approach this problem, but you have clues that should help narrow it down. Try to minimize brute force, double check the devices quickly then zoom in on a bad bit at a specific address or address range.
 
Last edited:

jmacz

Well-known member
Will do.

In the meantime, some initial probe results..

I put up an all white screen on the PDQ+ display and then used a multi meter to take a look at the SDQ output pins (which go to the DAC). Went through all 8 "blue" chips. One of them had a weird voltage reading on its four SDQ output pins. Hmm...

I then moved to an all black screen put painted just the problematic columns in white (the ones that had some bad pixels in them -- ie. every 4th column). The thought being the chips representing those problematic white columns would have a differentiated output on the SQD output pins. 6 of the chips had 0V on all four SDQ output pins. But 2 of the chips had non-zero voltage on the SDQ output pins:
  • Blue Chip 1 - 4.5V on all 4 SDQ output pins.
  • Blue Chip 2 - 2.5V on all 4 SDQ output pins.
Given the issue every 4 columns, I expected to find two VRAM chips with high output and found the above. Blue Chip 2 was the one I saw had weird output with the all white screen as well. Something clearly not right with that Blue Chip 2 as a pure white screen should have been a steady 4V+. Why half?

I then put probes on 5 of the blue chips, 2 of which were Blue Chip 1 and Blue Chip 2 above. I also added a probe on one red chip just for comparison. For each chip, I put a probe (the top one) on an SDQ output pin and a second probe (the bottom one) on the corresponding DQ pin.

Results:

screenshot.png

Blue Chip 3/4/5 looked normal. They were low on SDQ as they were black columns. Red Chip 1 probably corresponds to a white column (similar to the Blue Chip 1) and thus the output was high, and that also looked reasonable. Blue Chip 1 looks reasonable. It's got a high value (since it's a white column) and it holds steady after the activity on the DQ line ceases.

But look at Blue Chip 2 (the one with the weird voltage on the SDQ output pins) which now makes sense since it's flipping between high and low about 8 microseconds after it goes high. That does not look normal.

Given this Blue Chip 2 is on the white column which had the problematic pixels, my guess is something's wrong with this one. The period on that flip flopping looks regular so I'm guessing it's getting some interference from a clock signal?

Going to concentrate on this chip and check the passives around it as well as see if I can find/probe a clock signal with similar period (well that is after I confirm that back and forth is actually regular after I zoom in).
 

jmacz

Well-known member
The serial clock signal (which is pin 1 to the chip) is twice as fast as the flip flopping portion of the SDQ outputs on Blue Chip 2 and lines up perfectly in a timeline (ie. when the serial clock signal is being generated, I see the wackiness on the SDQ outputs). Looks like that might be where it's coming from. But given all four SDQ outputs (pins 2, 3, 26, 27) have the exact same problem, I am doubtful the issue is on a trace outside of the chip. My hunch would be it's happening inside? But it could be happening inside the BSR chip too. To prove whether it's the VRAM chip or the BSR, easiest method would be to pull the VRAM chip and see if the corresponding pins on the BSR side continue to see the issue. Sure hope it's not the BSR chip as that's impossible to replace (without a donor board) whereas I'm sure I can hunt down the VRAM chip.
 

jmacz

Well-known member
YES! Fixed!

I was getting ready to pull the VRAM chip (U31 in this case) but before I put the flux on, I decided to just check each pin/leg one more time. So I went around the chip and used some tweezers to nudge each pin outward. Made my way around the chip nudging each one when finally got to one pin and it moved! It was broken not at the solder joint where it meets the PCB but it broke off where it goes into the chip! I had done this previously across all the chips (VRAM, SQD, SMT, BSRs, basically every chip on the PCB) but clearly either missed this one or didn't nudge enough.

The pin in question was A0 (pin 19) one of the address inputs.

Fixing it was a bit annoying as it had cleanly broken off right where it enters the chip. I thought perhaps I needed to cut into the outer casing a bit but fortunately after a good number of attempts, I was able to get a small magnet wire bonded with what was remaining and then routed that back down to the PCB. Unfortunately looks ugly as once I got one end bonded, I decided ok, that's it, moving on. One day I will just pick up a replacement VRAM chip.

IMG_8177.JPG

Then I went back and tried it out, and it works! No more artifacts during hilite mode. No artifacts on mouse cursor movement. Ran my test program and it passed all the tests. Put it back on my logic analyzer and no more strange waffling of the signal on those SDQ output pins on U31 (blue chip 2).

Now that it's resolved and I know the issue... I'm contemplating if I could have done better. I had definitely nudged each pin on each chip but clearly didn't nudge enough or missed this one --- on second thought, I think I missed it as I was nudging at the solder joint with the PCB, not where it meets the chip body. Could have been more thorough. Should I have probed each VRAM chip from the get go? That would have been super annoying. In the end, although calendar-wise it took a while (given lack of spare time to really dig in), I'm glad this July 4th weekend gave me an opportunity to spend more time with this. I think the suggestion @MacOSMonkey gave me to write a test program months ago was the right call. Finally with time this past weekend, writing code helped reproduce the issue, let me walk through the assembly in MacsBug, narrow down the problem to one chip, and then figure it out.

Learned A LOT on this one.

Next up: return to debugging my last problematic SuperMac video card.. the SuperMac Spectrum 24 Series V which looks to have been stepped on or something.

🍻
 
Last edited:

MacOSMonkey

Well-known member
Great job! Huge congrats!!! Glad it's finally fixed! It was a great board to certify - run it with a Thunder ROM in it and ditch that PDQ+ one. :)

Don't second-guess yourself too much. It was a very educational process, as you say! And, you did a great job that also benefits others.

The main debug challenge ia always to find the minimal/atomic 100% reproducible case -- in software and/or hardware. The work you did got you there. Maybe you could have gotten to the answer faster by wiring the bus, but you still would have needed some of the clues you found...or hindsight. ;)

Cheers!
 

dougg3

Well-known member
It has been fun silently following along with this saga. Excellent work! Congratulations on fixing it.
 

MacOSMonkey

Well-known member
Oh - I see you have the Thunder 3.1 ROM in your picture. Perfect! We should compare benchmarks with my 1.6.0.1 version at some point to see if there were any accelerator speed-ups.
 
Top