• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

SuperMac Spectrum/24 Series III Display Artifact Issue

jmacz

Well-known member
So again, it could be I took a faulty measurement. But it's the following:

suspect.png
(Sorry for the numbering -- please note that the numbers are just arbitrary and do NOT correspond to chip pin numbers).

While it was in an error state previously, I had recorded in my notes that pin 1 and pin 3 and via 6 had continuity with < 1 Ohm resistance. What I did not have in my notes is that all three of those also had connectivity to pin 2 in the diagram above. I had dragged my probe down that column but either the continuity wasn't there or I had dragged my probe too fast. So I'm going to test the continuity between those pins again once I return to an error state. All of those pins are Data/IO pins.

The other one at least in that area is the connection from pin 4 to via 5. I did not have that one in my notes either. And pin 4 happens to be one that is labeled to be a CLK pin in the GAL spec. But again, as you pointed out, it's not clear this is being used as a clock since only the green pins in that diagram above are all connected and the remaining 8 (of 18 chips) aren't tied to each other.
 

MacOSMonkey

Well-known member
OK. Well... *jeopardy waiting music* :D

Edit: Note that if you find a bad source via on a trace that goes through a secondary routing via, the problem could either be at the pin via or the secondary (especially if within the heating zone).

General Info About Wiring Around Bad Traces (maybe not completely relevant in this case):

Very generally speaking
(and with greater importance in high frequency designs), when wiring around a bad trace, you should try to eliminate any trace stubbing, if possible. For example, if the trace goes to another device pin, then just lift the destination so that the rework wire is the only signal path. Otherwise, if it is an outer layer trace, then you could cut it at the source, destination or an outlet via (leaving enough for a pseudo pad or repair spot for a solder bridge) -- depends on where you need to route. Or, if there is a short hop to a secondary inlet via, you could cut the trace and wire to the secondary via that has a good connection. The point is to try to prevent/minimize dead copper and signal reflections. And, when necessary, you can also terminate traces with resistors where impedance matching matters (such as in the case where you have 50 ohm trace requirements). Wherever there are significant impedance mismatches/junctions, etc., there can be signal reflections that can affect signal integrity. But again, it probably doesn't matter so much for the "Rider" accelerator board for Spectrum/24 Series III -- I am just adding the above for informational purposes.

Caveat: It may go without saying, but...never cut traces unless you are absolutely sure you have the right ones and you are sure of the rework. ;)
 

jmacz

Well-known member
Nice, appreciate that tip. I figure if I am eventually able to identify the pins involved, I will be spending more time under a scope tracing the paths (or attempting to).

Now if I can't find it and there's nothing left but the actual ICs... 😬😱
 

MacOSMonkey

Well-known member
True...and if it's a device via, depending on the clearances and whether or not you have a good drill press, you could also do a very careful drill and force-fill with silver conductive epoxy. Then, you could shorten the device pin so it touches instead of goes through the hole, and it would be an invisible rework. Or, if the via issue were to extend too far beyond the annular ring and the drill and fill didn't work, then you could still just fall back to rework wire.

Also, another tip: with rework, if you have to lift a pin (for whatever reason), try not to bend it at the package/shoulder. It is usually better to just shorten the through-hole lead slightly and carefully j-bend a foot that is above the surface so that you never strain or fatigue the shoulder joint at the package.
 

jmacz

Well-known member
Been a little over a month and thought I'd provide an update.

I could not find any traces with any non-normal resistance. Although I had other projects, I kept coming back to this from time to time to test some more and made zero progress in determining if any traces were bad. Yet heating did temporarily resolve the issue, albeit only for a shorter amount of time now compared to a week before.

I then did something really dumb and I regret it -- but it was due to a growing lack of patience with this card, and with my mind consumed with other projects. I did the following but instead of in a controlled manner, did it all at once:
  • Desoldered six chips in that area (I think previously I had done three) and resoldered them to the board with clean solder.
  • Desoldered the two board connectors and resoldered them to the board with clean solder.
  • Removed 3 capacitors in the area where I have been heating and replaced them with new capacitors.
The three capacitors (ceramic) were marked with "104" which means they should be 100nF. One of them was around 110nF and the other two were closer to 140nF. I would think that should be fine. But as I was holding one of them with a tweezer, it started falling apart (a chunk broke off). I did not think I was holding it that tight. Go figure. I replaced all three with new 100nF ceramic capacitors. These capacitors are sitting between Vcc and Gnd above each and every chip on the board.

I then gave the board yet another IPA bath, brushing, cleaning and installed the board back on the main video card.

Now it's been a few days and so far I haven't seen the issue. Going to keep an eye on it and see if the issue comes back. If it doesn't, that's where I regret making all three changes at the same time because if the problem stays solved, I won't know which one of the three fixed it.

If the issue comes back... well, after pulling my hair out, I might try replacing all the remaining 15 caps. And if that doesn't work, will probably desolder every chip and take another look at the naked board.
 

jmacz

Well-known member
FYI.

I think I might have figured out what was wrong. And it was not the card. That one capacitor that crumbled was probably damaged but I don’t think that was the issue.

After all of that work, nothing really made sense as being wrong on the card.

Recently I was debugging an issue on another computer and found that the ATX PSU I used for the conversion did not have independent regulation of the voltage rails. That meant that the 5V rail was slightly lower than 5V. You won’t always get 5V exactly but I was seeing around 4.76V or so on a fully loaded IIci and that was causing a couple problems. On the machine with this Spectrum 24/III it had the same brand ATX PSU but it wasn’t loaded and I was seeing around 4.85V on this machine, but it seems that was enough to cause this card to have issues. I was able to get the PSU (via adding load on the 12V rail) to increase a bit to 4.9V and that resolved it. All my other Spectrums were good to 4.75V or so, but for this one, it wasn’t enough power.

So in the end, it was a power issue.

I think the Seasonic that some folks are using also only provides around 4.8-ish volts on the 5V rail. So something to check if you are using a voltage sensitive card.
 

jmacz

Well-known member
Oh, I have swapped in a stock PSU which delivers around 5.05V and the card is good there.

I have also procured an ATX/SFX PSU with DC-DC which provides proper regulation of all rails, and that one is delivering 4.96 to 5.1V and card works well with that too.
 

jmacz

Well-known member
2+ months later, the dreaded artifact issue is back. *sigh*

Reproduced on a stock PSU providing 5.1V and 12.1V, as well as a converted PSU providing 4.94V and 12.05V.

Seems less pronounced on the stock PSU but still there. Tried with another ATX PSU at 4.86V and it’s even worse there.

From tinkering, the issue is worse when the card is cold and worse as the rails have slightly less voltage. Those PSUs are providing enough power so something must be up with the card still but no idea what. Doesn’t seem to be a busted trace.

The artifacts occur all over the screen, not on known offsets or locations. Mostly happens on high contrast edges like window borders, menu borders, etc. Only happens on accelerated 24bit (doesn’t happen with unaccelerated 24bit or accelerated/unaccelerated 8bit).

There’s clearly times when it’s unable to erase stuff on screen… for example, after the Mac boots and reaches the finder, the Welcome to Macintosh screen and cdev init icons are still on the screen even though the finder has loaded. That suggests some type of blit didn’t happen. Maybe instructions are getting skipped when in accelerated mode?
 

MacOSMonkey

Well-known member
Blind debugging is hard work. The problems with not erasing and blit artifacts could be bad address translations on the accelerator. So, I think it still maybe points to the GALs. Given that you see the board come up bad, that the problem goes away and doesn't come back once it's gone, and you have seen issues that may be related to rail stability, maybe the next place to look would be at POR (Power-On Reset) for the GALs on the accelerator to see what is happening with reset and clock during power-up (or any secondary resets that might occur at video mode switching, if any).

There should be a clean reset waveform and no clocking prior to reset. POR timing might be bad for some reason (bad RC or reset chip) or maybe the parts are not clocking properly (or clocking when they shouldn't be - like during reset?). It would be a datasheet guessing exercise. The accelerator may not be coming out of reset properly, which would mean that the GALs would not work as expected and translate garbage.

Assuming you identify the reset line, scope it at power-on (put a jumper on the back-side of the accelerator), check reset timing on your scope at each device and trace any primary reset line back to the main board as far as you can. There should be an active reset to the accelerator to ensure the state, but don't know. Anyway -- just something else to check that might be helpful, since you've tried everything else. Clean reset is essential in order for the devices to operate as expected. Speculatively, it could be a bad RC circuit that is causing reset to be too short, etc. In that case, as an intermittent problem, it might be right on the edge of the required reset timing -- which would not be out of the realm of possibility given the age of the board.

Good luck!
 

jmacz

Well-known member
Thanks @MacOSMonkey.

Do you by chance know how these GALs are logically organized? There are 19 of them. Are their responsibilities split up by location on screen, or by operation type, or? Or perhaps some are handling 8 bit acceleration vs 24 bit acceleration?

Asking because since the artifacts can occur anywhere on the screen and I can't imagine all chips having some type of localized fault, that would suggest a more common fault. Like for the timing, there's a problem before it gets to each individual chip, etc.

Also I'm trying to think through why this happens only for 24bit and not 8bit. A timing issue definitely could be why. I would think it's less the video memory since there are no issues in unaccelerated 24bit. So I think I do agree it's related to the GALs. But why no issue in 8bit accelerated then? Hmm.
 

MacOSMonkey

Well-known member
I don't know the organization. I'll think about it and/or try to find out. Spec/24 Series III was 24-bit acceleration. There were different transfer modes and programmable registers, etc., so there might be some splits. But, I mostly think it has to do with address space for the onboard RAM transfers. Anyway, I will look into it. There might be some software speed-ups for 8-bit...but 8-bit started in the original PDQ days, minimally. So, don't worry about 8-bit mode. There's no GAL split for 8 vs. 24-bit. In the first version, all the value and wow factor was in 24-bit acceleration.

It's impossible to know the exact, reproducible failure until you know it. But if the parts aren't resetting correctly, that could be an issue, depending on individual GAL sensitivity to the reset timing vs. spec. And, from a debug standpoint, it's generally good practice to look at power, ground, and reset. When all of that works, then you move on to address/data, etc.
 

jmacz

Well-known member
I don't know the organization. I'll think about it and/or try to find out. Spec/24 Series III was 24-bit acceleration. There were different transfer modes and programmable registers, etc., so there might be some splits. But, I mostly think it has to do with address space for the onboard RAM transfers. Anyway, I will look into it. There might be some software speed-ups for 8-bit...but 8-bit started in the original PDQ days, minimally. So, don't worry about 8-bit mode. There's no GAL split for 8 vs. 24-bit. In the first version, all the value and wow factor was in 24-bit acceleration.

This is helpful, thanks!
 

jmacz

Well-known member
One other thing I need to check is the delay component on that daughterboard (kappa st08cb500). Looks like it provides 5 taps that range from 10ns to 50ns of delay for a given input signal. Given it's a timing related component, will need to put a scope on those pins and check for any deviations. It's the component that had the factory bodge wire on it. And curiously, it's in the general area of where I was doing my heat tests months ago. The mainboard has a second one of those st08cb500 chips (u24). Probably will compare the two in terms of behavior.
 

jmacz

Well-known member
I was able to get the card working again. Hopefully this time it sticks.

Recap of Problem: various artifacts appearing all over the screen at accelerated 24 bit only. Any other bit depth or turning acceleration off eliminates the issue.

The artifacts had gotten progressively worse (in amount of artifacts and how often). I tried replacing the Kappa ST08CB500 delay with a new old stock part and that did not help.

With the nubus interposer I built, I was able to run the card outside of the machine lying flat on my work bench. At that point tried a bunch of things:
  • Easily probed resistance for various traces, pins, etc, while the card was running, but did not find anything of consequence.
  • Added supplementary power to take the voltage on the 5V up from 4.86V to 4.95V and that did not help either. So that rules out the voltage theory.
  • Utilized a logic analyzer and monitored various signals but did not find anything of consequence (at least didn't notice anything).
  • Reverted back to what I knew helped which was heat... with the card out of the machine I was able to apply heat very selectively. Heat did remove the artifacts but whereas previously applying heat would correct the issue for a few minutes, now applying heat corrected the issue for only 10-20 seconds.
  • I was able to apply heat selectively all over the daughterboard and narrowed it down to one of the GAL chips with 80% confidence.
At this point I figured it had to be either this GAL chip or the PCB underneath it. I removed the chip again (third time) and cleaned up the PCB and the legs of the GAL chip. I didn't want to desolder this chip again so I put a DIP 24 socket on the board instead (should have done this a while back).

I was about to rig a 24 ribbon cable to the dip socket so that I could run the chip off the daughterboard (I figured, that would allow me to heat the PCB or the GAL chip separately). But decided to ensure the socket was working first. I put the GAL chip into the socket and tested the card again. It worked!

Hmm... so it's been working now for a few hours whereas in the last few months, I could only get the artifacts to go away for 10-20 seconds with heat. My only guess was there was a bad solder joint on this chip but not sure how since I had removed and resoldered this chip two times before.

Anyhow keeping my fingers crossed that the card stays fixed this time. 🤞
 
Top