Macintosh Portable m5126 strange data bus / memory error

dc99

Member
I am having trouble sorting an issue with a friend's portable that was previous recapped by someone else and was working for years.

On startup it's now getting a data bus error 000E FFFF. The board was in a bit of a state, so I've since ultrasonically cleaned it as well as repairing some damaged traces around c11.

After multiple failed attempts to narrow down the problem I've been using the serial diagnostic facility (tech step equivalent) and come up with some strange behaviour.

Using the feature to save to memory location I've written 4 bytes (AA BB CC DD) into starting location 0

Using the command to read back 4 bytes from that same starting location I typically get the correct result AA BB CC DD however if I repeat the same read process multiple times every so often I'll get the wrong result as you can see below.

AA BB CC DD
AA BB AA BB
AA BB CC DD
AA BB CC DD

When it fails it appears the contents are the same as the previous 16 bits.

I've tried various other tests. I've confirmed I have the same behaviour at different memory locations as well as when reading a larger number of bytes (32). Reading from a location in the ROM this doesn't seem to happen (which makes sense otherwise the computer probably wouldn't be running code from the ROM).

Running the data bus test or ram test directly from the diagnostics mode typically returns FFFF indicating an error on all 16 bits. This would make sense given what I discovered above. Running the RAM test over a smaller amount of memory sometimes passes as the glitch is less likely to always happen.

Given the issue occurs on both upper (8-15) and lower (0-7) bytes it would appear the problem is something common to both.

Would anyone have any suggestions what type of failure might cause this type of problem.
 

Arbee

Well-known member
The duplication that's happening looks like maybe the data bus buffers are marginal and not reliably latching new data. So I'd be suspicious of those - on the Portable they're the two 74AC245s between the modem slot and the main RAM array (and the ROMs are right below them, assuming the ports side of the board is up).

If you have one of those Salae or similar USB logic analyzers you could monitor the input and output sides of one of the buffers and try to cause the problem to verify that's what's happening.
 

dc99

Member
Thanks for your suggestions. I initially suspected the 245's myself and purchased a cheap USB logic analyzer for that exact reason. Unfortunately, using the analyzer I'm not getting valid readings from the known source so either the device is either faulty or it's a software issue so I'll need to sort that out first.

However, with what I know, given the issue is randomly happening on both the upper and lower bytes I feel the likelihood of both 245's failing in this same way is probably unlikely and it's possibly something common to both.

Since my previous post I've done some further write tests which may highlight what is going on. I started with writing some like FF FF FF FF FF FF FF FF directly to memory followed by 01 02 03 04 04 05 06 07 08 over the same location and then a read of that data. Repeating the sequence occasionally I found data from the previous write indicating it seems to be skipping a write as the value from the previous write is there. ie. 01 02 FF 03 04 05 06 07 08. I say skipped otherwise you would have expected the FF to have been written as 01 which would have been the last value in that buffer but it appears no write occurred for it to remain as FF.

From this evidence I wonder in the previous read example if the read is also just being skipped and I'm just seeing what was last in the buffer. Thinking of what might cause this I suspect it could be something around the additional RAM logic ic's specific in the m5126. I'm not familiar with exactly what all the extra logic does other than the 74AC138, which could potentially not be enabling the RAM causing the read/write to be skipped?
 

Arbee

Well-known member
It's not uncommon for two chips of the same make, type, and manufacture date to fail at the same time - we see it pretty often in 30+ year old arcade PCBs these days. That said, sorting out the logic analyzer would help a great deal to determine what's happening.
 

dc99

Member
Managed to get some measurements when it fails to read correctly, and it appears it's because in that case no RAM are enabled (ie. CE is high of all 4 pairs) so the contents in the buffer doesn't change.

Working backwards from this takes me back to U4G. Looking at pin 1 during a failed read the OE is HIGH (LOW when reading correctly). Frustratingly I've tried multiple times to buzz this trace out on this board, but I can only find it going to a 100k pull down resistor RP4 - pin 12.

As this is part of the replacement logic for the newer ram it's not shown on the schematics available online for the m5120. If someone has a m5126 board handy I would appreciate if they could check to see if they have better luck seeing where pin 1 from U4G (74ac374) goes.
 

SuperSVGA

Well-known member
As far as I know that /OE pin only goes to the pull-down resistor and a test point, but I can check again. U4G is mostly just for generating /DTACK, RAM card /CS, and refresh signals for the RAM.
 

dc99

Member
Thanks, that sounds right as I also found the test point. I'll check my results again in case it was a false reading.
 

dc99

Member
As far as I know that /OE pin only goes to the pull-down resistor and a test point, but I can check again. U4G is mostly just for generating /DTACK, RAM card /CS, and refresh signals for the RAM.
On further investigation the logic analyzer was causing the false reading on this pin, you are correct that it's just pulled down to ground via RP4.

I've gone back and confirmed my other readings, and it appears the problem is with an invalid RAM CE signal coming out of U3J (74ac138). I am reading from the same location however when the read fails, I've discovered Output 4 goes LOW instead of Output 0 so no RAM is enabled. I believe only Output 0,1,2,3 are actually used for the 4 pairs of RAM.

Checking the input pins on U3J (74ac138), on a failed read they are all correct except pin 3 (A2) which is HIGH instead of LOW. I know pin 1 (A0) goes to Address Line 18 on the GLU and pin 2 (A1) goes to Address Line 19 on the GLU however I can't understand what the purpose of pin 3 (A2) is, it looks to be an output from U4G (74ac374) pin 16.
 

SuperSVGA

Well-known member
Checking the input pins on U3J (74ac138), on a failed read they are all correct except pin 3 (A2) which is HIGH instead of LOW. I know pin 1 (A0) goes to Address Line 18 on the GLU and pin 2 (A1) goes to Address Line 19 on the GLU however I can't understand what the purpose of pin 3 (A2) is, it looks to be an output from U4G (74ac374) pin 16.
It's part of the timing for things like the required refresh windows and a few other cases. It effectively disables the 4 select lines.
 

dc99

Member
It's part of the timing for things like the required refresh windows and a few other cases. It effectively disables the 4 select lines.
Thank you, that makes sense why there is a condition when no RAM gets enabled.

In the case of this computer I can see when the read fails it is because a RAM refresh is happening at the same time. Are you able to explain how this is supposed to work so the two actions never overlap as I'm not sure where to investigate next.
 

SuperSVGA

Well-known member
Thank you, that makes sense why there is a condition when no RAM gets enabled.

In the case of this computer I can see when the read fails it is because a RAM refresh is happening at the same time. Are you able to explain how this is supposed to work so the two actions never overlap as I'm not sure where to investigate next.
When the 68000 is reading or writing from the bus, it asserts the address lines and then asserts /AS to signal that it is either waiting for data to be put on the bus or it has put data on the bus.
The 68000 then waits for /DTACK to be asserted by logic somewhere else to signal that the other device on the bus has put data on the bus or is finished receiving the data from the bus.
Typically /DTACK would be delayed for the device's cycle time since the device may be slower and need time to read the address and move the data.
So what I believe should be happening is that during refresh /DTACK will be delayed until refresh is finished, the RAM is selected, and the set cycle time delay is factored in.
Additionally if somehow it takes too long, the /DTACK signal from the CPU GLU will override the new RAM circuitry's /DTACK.
 

dc99

Member
Just to close this, the problem ended up being a broken trace under U4G in the RAM logic. Not sure what caused it to break but it resulted in the system using the DTACK generated by the GLU rather than the replacement M5126 RAM logic thus causing the random conflict with the RAM refresh. Thanks to SuperSVGA for your assistance.

As part of working this out I ended up drawing out the ram logic which I'll post separately.
 
Top