• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

SuperMac Spectrum/24 Series III Display Artifact Issue

MacOSMonkey

Well-known member
Pull-up or pull-down -- a 10k is fine - it doesn't matter in this case. The pins appear to be no-connects, so you don't really need to do this test.

It's true that the chips face inward when the daughtercard is assembled. It's also true that the problem is near the center, which might be in an area where there could be some flexing (assembly pressure, etc.). So, if you do happen to find a bad/resistive via (indicating that it is fractured or maybe corroded -- old flux, galvanic effect, etc.), another way to potentially fix the problem would be to repair it. If the problem is with a surface trace, you can just scrape the mask and remediate it (solder or jumper). If the trace is on an internal layer, it might be possible to drill (via a controlled process -- drill press, fab driller, CNC, etc.) and manually refill/replate it, assuming you were careful about any plane and trace clearances when drilling. It looks like there might be some tight clearances, so it would require care and skill. But again -- you would attempt something like this only if you were to clearly detect/verify a problem. The ohmmeter is your friend. ;)

It looks like there are a number of internal traces on those devices. It's a 4-layer board and the stackup appears to be:

Top Side
Ground Plane with Vcc routes (I don't see any surface Vcc connections)
Internal Trace 1
Bottom Side
 
Last edited:

jmacz

Well-known member
10K is a common weak pull up value, it ought to work for experimentation's sake without risk.

Thanks.

Pull-up or pull-down -- a 10k is fine - it doesn't matter in this case. The pins appear to be no-connects, so you don't really need to do this test.

It's true that the chips face inward when the daughtercard is assembled. It's also true that the problem is near the center, which might be in an area where there could be some flexing (assembly pressure, etc.). So, if you do happen to find a bad/resistive via (indicating that it is fractured or maybe corroded -- old flux, galvanic effect, etc.), another way to potentially fix the problem would be to repair it. If the problem is with a surface trace, you can just scrape the mask and remediate it (solder or jumper). If the trace is on an internal layer, it might be possible to drill (via a controlled process -- drill press, fab driller, CNC, etc.) and manually refill/replate it, assuming you were careful about any plane and trace clearances when drilling. It looks like there might be some tight clearances, so it would require care and skill. But again -- you would attempt something like this only if you were to clearly detect/verify a problem. The ohmmeter is your friend. ;)

It looks like there are a number of internal traces on those devices. It's a 4-layer board and the stackup appears to be:

Top Side
Ground Plane with Vcc routes (I don't see any surface Vcc connections)
Internal Trace 1
Bottom Side

Alright, sounds like fun. Have my project for this weekend. 👍
 

jmacz

Well-known member
Pull-up or pull-down -- a 10k is fine - it doesn't matter in this case. The pins appear to be no-connects, so you don't really need to do this test.

Just took a few minutes so tried it just to rule it out. And as expected, didn't change a thing. Sounds like they are no-connects. Will move onto the heat shield and looking more at the resistivity this weekend. Thanks again.
 

jmacz

Well-known member
I have been following traces and testing resistivity for the pins/vias around those chips. Don't really see anything that looks out of norm? Each pin that's connected seems to be reporting back with < 1 ohm resistance. Haven't found anything of consequence yet. It's time consuming determining all the interconnections.

While doing this, I was thinking that the artifacts might be noise? So in this diagram below, all the purple boxes are ceramic capacitors. Red circles power Vcc. Yellow circles are GND. The capacitors (purple boxes) have 104 markings on them so I believe they should be 100nF? I took 3 of them out (near the area where I have been heating) and all three seem fine... after discharge, they quickly shoot up to millions of ohms resistance when being tested. The capacitance measured at 150nF each. They seem ok and I believe these are some type of noise filter as they seem to be sitting next to each IC and tie power to ground.

power.jpg

Then I looked at the artifacts some more and they don't look like noise... but almost as if the wrong set of pixels got rendered. Almost always one of the following is true:
  • A set of pixels got shifted up/down 1 pixel or left/right 1 pixel.
  • A single horizontal line appears where it shouldn't.
  • A single vertical line appears where it shouldn't.
  • A rectangular box appears where it shouldn't.
And for these misplaced pixels, they are always white, or black, or gray. Never another color outside of those three. Watching the pattern, it's usually the dirty region of the screen (requiring a redraw) where this happens. I'm dragging a window around hence the boxes and lines for redraw area. It also feels like maybe the timing might be off?

Thinking about timing a bit, IF these chips are GAL 20v8s, given the red circles are Vcc and yellow circles are GND, the blue circles should be the CLK signal. If that's the case, shouldn't all the CLK pins (blue circles) tie together (ie. be the same)? I guess I can check it with an oscilloscope this weekend. But as of right now, those blue pins are not tied to the same source.
 

jmacz

Well-known member
I've checked resistance on all the connections between the three chips I have been eyeballing as well as the one chip on the other row opposite the three chips. They all check out.

I used my rework heat gun with the narrowest tip and slowly reheated all the pins, vias, and the pcb itself where those 3 chips are. Then let it cool down and put the card back into my Mac. Now I can't repro the issue at all for the last couple of hours. Seems to suggest something subsurface in the board near those chips.

Without know where all the connections are supposed to be, it's hard to tell where the connectivity issue is. For the areas where I identified connectivity, the resistance seems fine. In order to get a better look, I'd probably have to remove all the chips, not just the three I removed earlier.
 

MacOSMonkey

Well-known member
I don't think the problem is with random noise artifacts. It sounds like an address translation issue -- and it only takes 1 upper-order bit to put a blit into hyperspace. Early-on, the Spec/24 III daughtercard did have an acceleration device/timing problem with thermal pixel drop-out, but that was resolved and wouldn't cause the issue you are seeing. With the drop-out problem, over many blits, there would be cumulative pixel garbage (that worsened with heat -- e.g. during burn-in testing).

If you don't ever see the problem again and you focally reheated every pin/via (multiple times), it's possible that you inadvertently effected a repair. If the problem was a via or bond wire issue, with the many heating cycles, you may have gradually reflowed your way to a fix. And, that might even include solder flow/migration through a partly corroded/damaged/underplated via. If you don't see the problem come back on a cold system/cold board, then you have probably fixed it -- perhaps through sheer determination! I still suspect that it was an occult soldering or via issue on a single connection.

As for the device clocking, not sure how they were wired/used. The theory of acceleration was to do the required setup (addresses, type of operation, etc.) and then execute on board (eliminating as many nubus cycles as possible). So, whatever is happening (and maybe also depending on the GAL device mode), any clocking would be running internal to the accelerator hardware and is probably OK, or nothing would work.

Curious to know if the board will work cold the next time you try it!
 

jmacz

Well-known member
Still working ok. Let's see how long it lasts. Last time I got about a week before it regressed.
 

jmacz

Well-known member
Ok the Voodoo Shaman wasn't strong enough. Artifacts came back today -- again, only lasted one week. So my guess is the board somewhere.

This time I'm going to try a different approach. Given the issue is happening right now, I'm going to confirm and remap out all of the pins/vias in the area I heated last time. Do a brute force n^2 walk through all of them to see what they are connected to across the entire board. Then I will use the same heat approach to heat it up and hopefully get it to work without artifacts again. Then again do a brute force walk through all the pins/vias to see where connectivity exists. Then hopefully I can spot a delta and that will be my culprit.
 

MacOSMonkey

Well-known member
If you are checking the board in a failed state, do the resistivity test again. If there is a bad trace or via, it may not be a full open.

It could also be related to power, ground or maybe a bad cap. Check resistance on all the power/ground links.

If it's a bad/intermittent device cell, then maybe it is coming up in a random state at power-on. To rule out a random state, you could try n (20?...or until it works) power-ons (with power off & fully drained between tests) and see if there is ever a case where it works when starting up cold when it seems to be failing.
 

jmacz

Well-known member
If you are checking the board in a failed state, do the resistivity test again. If there is a bad trace or via, it may not be a full open.

For this resistivity test, which two points are you suggesting? ie. pick a pin on a chip, and then what's the other end? I was assuming I had to know where the trace goes (ie. the destination) and then measure the resistance between those two points. But I'm having a hard time determining what else each of the chip pins are connected to. Or is there a simpler way to do this?

Otherwise, I spent some time mapping out where each pin is going. I focused on chips D, E, F, and N which is where I have been applying heat to temporarily correct the issue. With a narrow tip on the heat gun, I think the most impactful area is the green rectangle below (between chips E and D. I could be wrong but that's what it seems like.

heat_locations.png

So all the reds are tied together and are Vcc. All the browns are ground. The greens are also tied together and are going to pin 1 (clk) on those GALs. But note that not all chips have this pin tied together. The yellows seem to not be connected to anything. The blues are the ones I have been meticulously going through one by one and checking every other pin :( on the board to see if it has continuity or not. O(n^2) task and not fun. Taking years. The gray ones are connected to one of the blue ones.

tests.png

So I then have continuity info for each pin... (see below). I have each pin on a separate layer so I don't have to look at the mess below and can only look at one pin at a time. The picture below is just with all layers visible at the same time to give an idea of what I'm (probably stupidly) dealing with.

wiring.png

That's just for 2 and a half chips... I still have 1 and a half chips to go, just to complete four chips (D, E, F, N). Again, I'm going to pause my exhaustive exercise and focus on the two columns between D and E.

I am also able to repro the "temporary fix"... if I heat that column (between D and E), the card works for a few days without artifacts. Not sure how many more times I can do that without something else going wrong so I probably need to hurry up.

I have also ruled out the main board. I tried heating only that board up in the general area of the daughterboard and it had zero effect. I can temporarily fix the issue by heating just the daughterboard (removed from the main board) on that column between chips D and E.

There's gotta be something wrong in that area.
 

jmacz

Well-known member
Also note, yes, I realize the diagrams above are not ideal. But I needed a quick way to jot down my progress as I go through each pin. If I ever finish this, I'm sure I could at least put a partial schematic together :) although somewhat pointless without the programming in those GALs.
 

MacOSMonkey

Well-known member
Nice detail. I think you have the right idea, but maybe need to take a simpler approach to prevent the rabbit hole effect. ;) You shouldn't have to do infinite tracing.

The starting point is to figure out what data lines are on those 2 (or 3) devices, factoring out power, ground and clock (they may not be the source of the problem). And it looks like you have already done that work. Then, check resistivity at the endpoints of this handful of data lines. As before, it could be that you are looking for a single resistive/corroded/suspect connection, if that is actually the problem (which might make sense vs. heating). Resistance should be < 1 ohm.

If there is a resistive connection that is creating a problem, the change in resistance should be very perceptible (assuming it is not a full open). You are looking at relatively small trace lengths, where a standard 10 mil .5oz trace has normal resistance of about .1 ohms/inch. So, you may see some slight variation based on trace length (still < 1 ohm), but you are looking for any gross/significant resistance change that deviates from the controls (that will probably be in the KOhm or MOhm range, if present).

So, don't spin your wheels too much. If you can't find anything that varies on the target devices, then you could be dealing with an internal GAL issue.

After you do the data lines on the devices, you could check to see if there is any issue with power, ground or clock by just checking to a nearby power/ground pin and clock on another device.

If you find something that is clearly a problem, then just (slightly) lift the pin and wire around it to the destination to see if the problem goes away.

So - to summarize, if there is a fractured via or corrosion issue with a resistive connection in the PCB, you should be able to find it via (see my little joke there :D ) the above method. If it's internal to the device, you won't find it this way.
 

jmacz

Well-known member
Ok, good to see I'm relatively on the right track. I started checking every possible connection but I am ignoring the various vias right now because eventually they need to tie back to a chip pin or a connector pin. So all my focus is on the chips and the connectors. I already tested and eliminated the Vcc and ground lines on all the chips as culprits and those are easy to test. But outside of those, I think the rest are potential data lines? There doesn't seem to be any consistency in which pins are used across the chips I have looked at so far.

Screenshot 2023-08-07 at 10.28.02 AM.png

So that's why I'm unfortunately having to check about 22 of the 24 pins on each chip. Completed 2 only and part of one. Still have 1.5 to go. But should be a bit faster now that I'm going to ignore the vias for now. I can look at those after discovering which actual chip pins change between the working state and failed state. At least that's my plan.
 

MacOSMonkey

Well-known member
Nah - just check the cold board and show the resistance map you have for your test. I have 2 or 3 working boards I can check.

Also, the mapping should be fast (except for no connects, which are the worst-case detection scenario). Use a beep continuity tester, put one probe on the source pin and then just drag across destinations. Start by checking both connectors, then if not found, just drag across the pins on the other devices. Do the drag testing on the component side. It's a lot easier to just drag across the device legs, but YMMV. Listen for the beep, then verify. For the connectors, maybe best to drag across the solder side. Once you know the map, then check resistance.
 

jmacz

Well-known member
Nah - just check the cold board and show the resistance map you have for your test. I have 2 or 3 working boards I can check.

Also, the mapping should be fast (except for no connects, which are the worst-case detection scenario). Use a beep continuity tester, put one probe on the source pin and then just drag across destinations. Start by checking both connectors, then if not found, just drag across the pins on the other devices. Do the drag testing on the component side. It's a lot easier to just drag across the device legs, but YMMV. Listen for the beep, then verify. For the connectors, maybe best to drag across the solder side. Once you know the map, then check resistance.

Yup, been dragging but what was slowing me down was individually testing the vias between chips also. I didn't realize until yesterday that that's clearly a waste of time. So what I have been doing is for every chip leg, I am dragging all the other chip legs (and connector pins) for a beep and then noting it down (all the blue lines in the diagram I shared). I've also been looking at the resistance between two end points of the connection and have not found a single one over 1 Ohm yet. They are all under that threshold so far.
 

jmacz

Well-known member
Right now, the card is in the working state after the last heat between chips D and E. I'm waiting for it to get into the bad state again. In the meantime, while it's working, I mapped out all pins in that general area (the orange box) below and where they go on the board. And all of them are showing < 1 Ohm resistance (as they should given it's working right now).

mapped.png

Now I'll wait until it returns to the failed state and check all those same pins, resistivity, etc. I also identified 2 pins that were suspicious (I didn't see a connection there previously but now I see them in the working state). I could be wrong and just didn't measure properly previously but keeping my fingers crossed that those might be potentials for a fault. Probably take a week for it to fail again.
 
Top