Word access odd Address Exceptions on 68K software - were they ever used?

Snial · Mar 20, 2024

A 68000 CPU will generate an Address Exception if there's an attempt to read a word (or long) from an odd address. Was that feature ever used (apart from to trap errors)? I'm asking, because I believe that odd addresses were used on some Amiga software to implement dynamic libraries. If you load a module of code, but all the JMPs / JSRs are to odd address, then there's an exception, so a supervisor routine can pick up the address (which could be encoded as Module:Entry) and then convert it to an actual address, loading the module if needed.

On 68K Macs, most of this is accomplished using the Segment Loader, but it's possible that programmers did have uses for odd address exceptions.

rplacd · Mar 25, 2024

+1 to this question – seeing people do things "not the Mac way" is interesting.
Perhaps a fancy bells-and-whistles debugger that didn't touch the Toolbox?

Phipli · Mar 25, 2024

I assume this would cause issues later? No exception raised on the 68020 and it just jumps randomly into space?

Also, is what you described still possible with how Apple used the upper bits as flags in 24bit mode?

Arbee · Mar 25, 2024

An exception *is* raised and it jumps through a vector like usual. I'm not sure off the top of my head if that vector is ever valid under MacOS though.

ymk · Mar 26, 2024

Arbee said:
An exception *is* raised and it jumps through a vector like usual.

Which exception? 020+ doesn't require word alignment as far as I know.

Phipli · Mar 26, 2024

I've only just realised you meant odd as in not even - I thought you meant odd as in weird, and meant out of range (i.e a number above the 24bit address range).

Sorry, just being dumb over here.

Snial · Mar 26, 2024

Hi folks,

I think I'll add a bunch of replies here:

rplacd said:
+1 to this question – seeing people do things "not the Mac way" is interesting.
Perhaps a fancy bells-and-whistles debugger that didn't touch the Toolbox?

Apple also didn't seem to use the TRAP instruction exceptions either, but they are useful. Examples:

1. I hooked a TRAP exception to force a jump into the Mac Plus ROM debugger when I was debugging MiniMorse using MacAsm:

MacAsm MiniMorse Masochism

Earlier this year I became interested in learning the basics of Morse code. I had a very simple concept for my Morse code tutor, it would simply convert a letter I typed in into Morse or convert dots and dashes typed in, into a single letter. And initially that was because I wanted to fit it on...

68kmla.org

2. The Sinclair QL used TRAP exceptions as its OS interface, so to some degree it'd be possible to emulate a QL at better than full speed on a compact Mac (there are some other problems, but at least that isn't one of them).

Phipli said:
I assume this would cause issues later? No exception raised on the 68020 and it just jumps randomly into space?

Also, is what you described still possible with how Apple used the upper bits as flags in 24bit mode?

Phipli said:
I've only just realised you meant odd as in not even - I thought you meant odd as in weird, and meant out of range (i.e a number above the 24bit address range).

Sorry, just being dumb over here.

The MC 68020 still does an address instruction exception at an odd address. W.r.t 24-bit addressing mode, then, obviously on a 68000 the flags are just masked off, but on a 68020 or higher Mac we know that the OS and possibly the hardware had to work around those issues. e.g. NuBus was (I think) mapped into 24-bit space for dirty ROMs (something like, all the cards were given 1MB from $A00000 in 24-bit mode); the the specialised PMMU on the Mac II perhaps had to mask off the upper byte and perhaps the LC's chipset had to do something similar. On the 68030, I think it might be a bit easier, since it has 3 (?) levels of page tables. So, supporting 24-bit mode could mean mapping all the $xxyyyyyy page directory entries to $00yyyyyy. Then if it's running with VM off, yyyyyy simply gets mapped to its physical address equivalent - except the ROM could be moved and gaps in the physical address space could be closed up.

Arbee said:
An exception *is* raised and it jumps through a vector like usual. I'm not sure off the top of my head if that vector is ever valid under MacOS though.

Yes, that's what I was wondering too.

ymk said:
Which exception? 020+ doesn't require word alignment as far as I know.

It still does for instructions.

So, the reason for asking is that Gary Davidian did actually reply to an email I posted on the M88K emulator where he said that the M88K emulator had to properly emulate 24-bit address mode and odd address exceptions. I think he could have emulated 24-bit addressing mode without any performance cost, by the same mechanism as I've described above, but odd address exceptions for code would need an extra couple of instructions whenever there's a Bcc, DBcc, JMP, BSR JSR, RTS or RTE. It doesn't need it in the main loop, because if the previous instruction was even, the next one will be too. So, given that JMPs happen about 25% of the time, that's equivalent to putting about 0.5 instructions in the main loop, a performance reduction of 4%.

I don't know if PowerPC had to handle that, maybe it did!

I'll send updates if there's progress on the M88K front.

Arbee · Mar 26, 2024

As far as I know the odd address exceptions happen even on loads/stores, not just execution. So the check (basically if (address & 1) for a 16-bit transaction, and if (address & 3) for a 32-bit transaction) would have to happen on every memory access. I wouldn't be surprised if the emulator omitted generating those exceptions since software doing that wouldn't run on a real 680x0, Amiga-like tricks aside.

The LC's memory map very much requires working 24-bit mode so I can see the M88K needing 24-bit support. The PPC machines are cleaner and may have been able to get away without that.

Phipli · Mar 26, 2024

Arbee said:
The LC's memory map very much requires working 24-bit mode so I can see the M88K needing 24-bit support. The PPC machines are cleaner and may have been able to get away without that.

Just a comment that might be of interest - my Centris 650 doesn't have a battery in it so always cold starts in 24bit mode. If you use 7.6 or later, this results in a double start where the OS detects 24bit mode and restarts in 32bit mode.

What is interesting though is that whatever OS I'm running, if I start with my PPC upgrade enabled, it PPC chimes, then does the memory test, then restarts in 32bit.

I don't know how early the PPC card kicks in, but it looks like it is starting from the PPC card's ROM in 24bit. Just thought it was sort of interesting.

Arbee · Mar 26, 2024

Phipli said:
Just a comment that might be of interest - my Centris 650 doesn't have a battery in it so always cold starts in 24bit mode. If you use 7.6 or later, this results in a double start where the OS detects 24bit mode and restarts in 32bit mode.

Right, I see that in MAME also on a first-time boot of a machine where there isn't saved PRAM and it's booting the Legacy Recovery ISO.

Phipli said:
What is interesting though is that whatever OS I'm running, if I start with my PPC upgrade enabled, it PPC chimes, then does the memory test, then restarts in 32bit.

I don't know how early the PPC card kicks in, but it looks like it is starting from the PPC card's ROM in 24bit. Just thought it was sort of interesting.

My assumption is that since the ROM code and the System 7 boot code are both 32-bit clean that it could run that "24 bit stub" (it just checks and sets the PRAM flag for 32-bit mode) in 32-bit mode and nothing would care.

Phipli · Mar 26, 2024

Arbee said:
My assumption is that since the ROM code and the System 7 boot code are both 32-bit clean that it could run that "24 bit stub" (it just checks and sets the PRAM flag for 32-bit mode) in 32-bit mode and nothing would care.

I was wondering though, which CPU is it running on. 040 or PPC?

Snial · Mar 26, 2024

Arbee said:
As far as I know the odd address exceptions happen even on loads/stores, not just execution. So the check (basically if (address & 1) for a 16-bit transaction, and if (address & 3) for a 32-bit transaction) would have to happen on every memory access. I wouldn't be surprised if the emulator omitted generating those exceptions since software doing that wouldn't run on a real 680x0, Amiga-like tricks aside.

The odd address exception for data fetches won't happen for the 68020 or later. "The 68020 has no alignment restrictions on data access"

Motorola 68020 - Wikipedia

en.wikipedia.org

I've just been reading a bit about exceptions on the M88K user manual (I'll come to PowerPC next) and found that the M88K does generate exceptions for odd address, >8-bit accesses and for odd bit-1, 32-bit accesses, but you can suppress the exception in the PSR. This means, I think, that the Davidian Emulator started out having to trap odd addresses, because it was initially emulating a Mac SE, so that didn't need to be part of the main loop; but when emulating an LC, would mask out the odd address data trap and have to emulate unaligned 16-bit data access (but not unaligned jumps).

So, there are IMHO two basic ways to do this, either you load a 16-bit value by either doing: dest=((*(uint8_t*)src)<<8)| (((uint8_t*)src)[1]); or if((uint32_t)src)&1) HandleUnaligned16Bit(); else dest=(uint16_t*)src; . And I think it would choose the latter, because a test and a skipped jump is cheaper than a double load byte, shift and or, but also because unaligned 16-bit access on a Mac will be very uncommon, because it's not supported on the original 68000 (and so even 68020 or 68030 software would normally align 16-bit data on 16-bit boundaries).

But 32-bit access on 680x0 emulation is different, because the 68000 could do it and it was normal not to try and align 32-bit data on 32-bit boundaries, but normal to align 32-bit data on 16-bit boundaries. So, then the emulator has a basic choice of:

dest=((*(uint16_t*)src)<<16)| (((uint16_t*)src)[1]);

Or:

if((uint32_t)src)&2) HandleUnaligned32Bit(); else dest=(uint32_t*)src;

Then it all depends upon whether a jump, 50% of the time or two loads / stores is more costly (the M88K has a Branch Target Cache).

Arbee said:
The LC's memory map very much requires working 24-bit mode so I can see the M88K needing 24-bit support.

Correct, the 68K Davidian emulator on M88K needed to support that. I think though that it would handle it using the MMUs. For all I know, you know everything about the M88K in which case I'm just reflecting back my understanding. The 88200 has a Block Address Translation Cache, which caches 10x 512kB block addresses; and a 56 entry ATC which caches 4kB pages. The actual translation (one each for User/Supervisor) is a 1024 x 4MB segment table which indexes into 1024 4k Pages. So, 24-bit addressing support can be done by mapping SegmentTable[x] to the same page table as SegmentTable[x&3].

From BitSavers.

Arbee said:
The PPC machines are cleaner and may have been able to get away without that.

Correct, 24-bit mode isn't supported, though, as per your later comment:

Arbee said:
My assumption is that since the ROM code and the System 7 boot code are both 32-bit clean that it could run that "24 bit stub" (it just checks and sets the PRAM flag for 32-bit mode) in 32-bit mode and nothing would care.

..it may have to modify the PRAM if it finds it's in 24-bit mode on boot.

Arbee · Mar 26, 2024

The M88K MMU is (unsurprisingly) pretty similar to the various 68K versions. Most 680x0 machines do 24/32-bit by switching MMU tables so that would translate right over to the M88K.

Right, the "stub" I was talking about is what checks/sets the PRAM. I should've made that clear.

Snial · Mar 28, 2024

Arbee said:
The M88K MMU is (unsurprisingly) pretty similar to the various 68K versions. Most 680x0 machines do 24/32-bit by switching MMU tables so that would translate right over to the M88K.

Aaah, well this comment has lead me to look into the 68K MMU in more detail, specifically the 68030 MMU. It's fiendishly more complex than the M88K's MMU! As we've seen, the M88200 has a fixed, two-level table, but the 68030 has up to 4 levels + FC table entries! So, from page 9-54 of the 68030 User Manual:

There are a number of key differences. Firstly, PS (8..15) defines the page size as 1<<PS bytes (i.e. 256B to 32kB); whereas the M88200 only allows 4kB pages. Fortunately, System 7 uses 4kB pages. Secondly, the IS masks off logical addresses, so 24-bit support can be done that way. Thirdly, TIA, TIB, TIC and TID fields define the number of bits per level (I think it must be that, because otherwise it could only translate 15+8 = 24 bits if the page size is 8-bits or 15+12 = 29 bits = 512MB as the page size is System 7's 12-bits). Thirdly, I've been reading this Technote:

Virtual Memory Application Compatibility

It's written in the late System 7 and System 8 PowerPC-era, so it might not be describing System 7.1 virtual memory, though quite a bit of it seems to be saying that the 68K VM is being used. There is also Inside Macintosh:Memory:Virtual Memory Manager chapter, which says the VM limitation in System 7 is 1GB.

I guess if I was Apple in the 90s, I'd set up a two-level page table where the top level has a minimal number of entries and bottom level is linear as that'd be the most hacky, but convenient mapping. So, TIA would contain up to 8 entries of up to 32K x 4k Pages stored linearly. Then the VM code itself only has to mess with the 4kB page table. This page implies the System 7 VM is only 2-level, because otherwise it'd take quite a bit of work to translate block ranges.

Memory-Block Record

The GetPhysical function uses a memory-block record to hold information about a block of memory, either logical or physical. The memory-block record is a data structure of type MemoryBlock.

TYPE MemoryBlock =
RECORD
address: Ptr; {start of block}
count: LongInt; {size of block}
END;

Field Description

addressA pointer to the beginning of a block of memory.countThe number of bytes in the block of memory.

Translation Table

The GetPhysical function uses a translation table to hold information about a logical address range and its corresponding physical addresses. A translation table is defined by the data type LogicalToPhysicalTable.

TYPE LogicalToPhysicalTable =
RECORD
logical: MemoryBlock; {a logical block}
physical: ARRAY[0..defaultPhysicalEntryCount-1] OF
MemoryBlock; {equivalent physical blocks}
END;

Field Description

logicalA logical block of memory whose corresponding physical blocks are to be determined.physicalA physical translation table that identifies the blocks of physical memory corresponding to the logical block identified in the logical field.

Still, I think there are substantial differences between the M88200 MMU and M680x0 MMUs. PowerPC is different again, on various implementations either using 256MB segments and inverted page tables, and software page table lookup (on the 603(e)). So, I guess, on the scale of things, the difference between M88200 MMUs and 680x0 MMUs is minor compared with PowerPC.

Arbee said:
Right, the "stub" I was talking about is what checks/sets the PRAM. I should've made that clear.

Fair enough!

Arbee · Mar 28, 2024

Moto simplified both the MMU and FPU as the 68K advanced. The 68851 and 881/882 were super complicated, the '030 contained most but not all of those standalones, and the '040 was seriously cut down from the '030. The 88K is most similar to the '040.

Snial · Mar 28, 2024

Arbee said:
Moto simplified both the MMU and FPU as the 68K advanced. The 68851 and 881/882 were super complicated, the '030 contained most but not all of those standalones, and the '040 was seriously cut down from the '030. The 88K is most similar to the '040.

Aaah, I assumed the '040 was the same as the '030. I'll look up the '040 on bitsavers. Started to do that - so maybe given that System 7 only came out in 1991, they already made it '040 compatible? Do you know what the Apple VM page table structure was then? Do you have a link to that info? It's astounding how different VM mechanisms are, e.g. the MIPS R3000 basically did everything in software (you just get a page fault exception), but the address space is divided into 4. x86 I thought basically kept it upward compatible until x64. ARM's original MEMC was very bizarre, but the later ARM6 and ARM7 etc had a 3 level structure I think (with variable block sizes).

Gosh, the '040 has a 2-bit Physical address extension! 16GB! Oh, it still is quite a bit different, because the M88K is 10-bit segment table + 10-bit page tables, whereas the '040 is 7-bit TIA + 7-bit TIB + 5 or 6-bit page tables. But I can see that on an '030, you could define the TIA and TIB as 7-bits each and then TIC as 5 or 6 bits, with 8K or 4K page sizes.

Anyway, thanks for continuing to reply and help me reduce my Motorola MMU ignorance

!

Phipli · Mar 28, 2024

Snial said:
Gosh, the '040 has a 2-bit Physical address extension! 16GB!

I.... think they didn't wire it in. If it is the thing I heard about before.

DBJ314 · Mar 28, 2024

PowerPC instructions always have 4-byte alignment. When told to jump or return from an exception, the CPU always ignores the low 2 bits of the instruction address and pretends that they are 0. No exception will happen, the CPU will just fail to notice the address was unaligned.

PowerPC CPUs mostly don't care about the alignment of the straightforward integer loads and stores, but they can throw an Alignment Exception when a floating point load/store or one of the weirder load/store instructions is unaligned.

(This is from Page 244 of the PowerPC Programming Environments Manual)

Triggering an Alignment Exception will only result in a slowdown, rather than crashing the code that caused it. The NanoKernel quietly handles Alignment Exceptions by reading the faulting instruction and emulating it. Any misaligned memory access are split up into smaller aligned accesses.

Snial · Mar 28, 2024

Phipli said:
I.... think they didn't wire it in. If it is the thing I heard about before.

There's two signals, UPA<1:0> which can be programmed by the MMU.

Yes, so it all adds up. Each Page Descriptor has a U<1:0> pair of bits which can get sent out to the signals UPA<1:0>. The table descriptors themselves, don't have these bits, so if I understand it correctly (which I probably don't, going on past form), only the final translated addresses can be in an effective 16GB range, but the tables themselves must be in the first 4GB!

Word access odd Address Exceptions on 68K software - were they ever used?

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Well-known member

Memory-Block Record​

Field Description​

Translation Table​

Field Description​

Well-known member

Well-known member

Well-known member

Member

Well-known member

Similar threads

Memory-Block Record

Field Description

Translation Table

Field Description