have the Toolbox on board
Haha I'm way ahead of you on caching the ROM.
The STM32H7 has a "scattered memory architecture," with a bit over 1 Mbyte of SRAM but split into different banks and sizes to optimize throughout. And then there's the 32 Mbyte external SDRAM.
Firstly, the emulator software has to be stored in the 64 kbytes of "instruction-tightly-coupled-memory" (ITCM).
Any data structures, like trees, linked-lists, tables of pointers, etc. need to be stored in the first 64 kbyte bank of "data-tightly-coupled-memory" (DTCM). There is a second 64 kbyte bank of DTCM too.
The -H7 has a main SRAM of 512 kbytes, in which I planned to store the 171 kbyte (512x342x8bit) screen buffer for 8-bit grayscale on the compacts.
Other memories on the -H7 include two 128 kbyte banks and another 32 kbyte bank of SRAM in the "connectivity domain," another 64 kbytes in the "batch acquisition domain," and 4K of battery backup SRAM, like the Mac's clock chip has.
Then in the external memory will be stored the up to 9 Mbytes of RAM, up to 256 kbyte ROM, and then the full decoded cache for each, so 300% of 9.25 Mbytes, which is 27.75 Mbytes. The ROM will be loaded from the machine and fully decoded before boot. RAM is never written back to the Macintosh except VRAM, the cache of which would be in a write-through configuration.
There is a lot of internal memory left, some of which will have to be devoted to a USB buffer. What's left, however, can be used to move select areas of the RAM, ROM, and their instruction word decodings into on-chip SRAM. This is much faster than external SDRAM.
So for a given ROM, some "hints" should be supplied that tell what to store in internal SRAM and what to store in external SDRAM.
Memory accesses are going to go through a doubly indirect tree-type structure. What I mean is that there are 256 64kbyte chunks in the 24-bit address space of the 68000, and within each 64 kbyte chunk, another 256 chunks of 256 bytes. So you can sorta make a tree (actually it's not technically a tree) out of this, where first you look at A[23:16] and take that offset into a table of pointers, and then use A[15:8] to do it again from there. That can lead to a routine to access memory at that location.
If I didn't explain it well enough, the gist is that each 256-byte region of memory can be put in a different place or accessed with a different method. So that's how addresses accessed by the virtual 68000 will be resolved into an actual 68000 bus access, an SRAM access, SDRAM access, etc.
The hints supplied to the emulator would basically break the ROM down into 256-byte chunks and tell which should be in SRAM and which should be in SDRAM.
Tuning the emulator will be a problem in and of itself though lol, lots of tweaking can be done to improve performance, especially for a specific application.