Nice detail. I think you have the right idea, but maybe need to take a simpler approach to prevent the rabbit hole effect.
You shouldn't have to do infinite tracing.
The starting point is to figure out what data lines are on those 2 (or 3) devices, factoring out power, ground and clock (they may not be the source of the problem). And it looks like you have already done that work. Then, check resistivity at the endpoints of this handful of data lines. As before, it could be that you are looking for a single resistive/corroded/suspect connection, if that is actually the problem (which might make sense vs. heating). Resistance should be < 1 ohm.
If there is a resistive connection that is creating a problem, the change in resistance should be
very perceptible (assuming it is not a full open). You are looking at relatively small trace lengths, where a standard 10 mil .5oz trace has normal resistance of about .1 ohms/inch. So, you may see some slight variation based on trace length (still < 1 ohm), but you are looking for any
gross/significant resistance change that deviates from the controls (that will probably be in the KOhm or MOhm range, if present).
So, don't spin your wheels too much. If you can't find anything that varies on the target devices, then you could be dealing with an internal GAL issue.
After you do the data lines on the devices, you could check to see if there is any issue with power, ground or clock by just checking to a nearby power/ground pin and clock on another device.
If you find something that is clearly a problem, then just (slightly) lift the pin and wire around it to the destination to see if the problem goes away.
So - to summarize, if there is a fractured via or corrosion issue with a resistive connection in the PCB, you should be able to find it
via (see my little joke there
) the above method. If it's internal to the device, you won't find it this way.