• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

gb6: game boy emulator for System 6

Tekk

Active member
This is a very unfinished project I started a long time ago but recently picked back up - a Game Boy emulator targeting System 6 with the (impractical?) goal of running at a playable speed on an unmodified Mac Plus. Today I managed to get to a milestone: it displays something on the screen.

Screen Shot 2022-07-20 at 7.14.13 PM.png

I also created a OpenGL/ImGui-based UI to use on my modern laptop for enhanced iteration speed and less crashiness:

Screen Shot 2022-07-20 at 1.04.06 PM.png

It is SLOW. I haven't tried it on my real Mac Plus yet, but on Mini vMac set to 1x speed, it takes around 90 seconds to get to the Tetris title screen, with rendering turned off and not even calling WaitNextEvent - i.e. basically just:

C:
while (1) {
    step emulation;
    if (at title screen) {
        render output once;
        break;
    }
}

With Mini vMac set to "all out" speed and rendering/event handling turned on, it gets there in a few seconds. I knew it would be slow, but I didn't expect it to be this slow. There is still a lot of optimization I can do - for example, rendering directly into a 1bpp bitmap instead of using a one-byte-per-pixel array and copying everything down to 1bpp. My CPU code also has a lot of layers of function calls and I imagine the call/return overhead is actually significant on a machine this slow. I need to figure out how to do real profiling to see where the machine is spending all of its time. I think if I move the rendering into a vblank interrupt handler it might help.

The code is over on github if anyone is interested. I have been working on this pretty consistently recently and want to get it to a point where a few games are playable. I don't care too much about emulation accuracy, I just think this would be a neat program to have on 68k Mac. Even if it turns out to be impractical on a Plus, maybe it would be playable on a 68030?

I think once I implement the memory mapper chips for more complicated games and implement the LCD scroll registers and sprites, it will actually be useful. I will post any significant updates here!
 

Byrd

Well-known member
A gameboy emulator for System 6 in 2022? Yes please, I wouldn't say no. Can you focus on the 16Mhz '030 for playable speeds as found in a SE/30 and Classic II? Most people that tend to own a compact Mac in their collection have an SE/30 lying around and having the army of GG games to play would be amazing.
 

sfiera

Well-known member
This is cool, but do keep in mind what you’re up against! The Game Boy executes instructions at 1 MHz and the Plus at 8 MHz, so full speed would mean executing cpu_step() in well under 8 cycles (to leave room for the PPU and sound). It would be nice to see it running playable games, whatever the target machine.
 

Tekk

Active member
This is cool, but do keep in mind what you’re up against! The Game Boy executes instructions at 1 MHz and the Plus at 8 MHz, so full speed would mean executing cpu_step() in well under 8 cycles (to leave room for the PPU and sound). It would be nice to see it running playable games, whatever the target machine.
Yeah, not sure if that would be possible unless I do some crazy JIT thing where I recompile the game to 68k code and only use software emulation where instructions don’t match up. Just updating the emulated flags uses well over 8 cycles.
 

rjkucia

Well-known member
This looks great! Looking forward to seeing it progress & trying it out. I agree that targeting an '030 would have a good chance of being playable.
 

retrac

New member
This is cool, but do keep in mind what you’re up against! The Game Boy executes instructions at 1 MHz and the Plus at 8 MHz, so full speed would mean executing cpu_step() in well under 8 cycles (to leave room for the PPU and sound). It would be nice to see it running playable games, whatever the target machine.

The timing isn't quite that tight, but it is tight. I believe a full-speed emulator would be possible on the 68000 Macs, but the graphics couldn't be updated in real time and it would have to be some assembly.

Technically the GB CPU runs at 4 MHz but every instruction takes at least 4 cycles, and is timed in multiples of four, so some think of it as a 1 MHz processor. With a straight sequence of NOPs the GB CPU runs at 1 million instructions per second. But there are 2 and 3 byte instructions, and complex slow instructions. The 68000 is much more efficient per cycle.

Instruction decode can be very fast with such a simple CPU to emulate. Use the instruction byte and CB prefix flag, as a 9 bit pointer into a 64 KB jump table of 512 routines for each GB CPU opcode. I think that dispatch can be done in about 20 cycles on the 68000? Load masked into address register, then jump indirect. Many GB CPU instructions will have one or two instruction equivalents, leaving most of the table sparse (128 bytes per instruction available, you could jump out if you need more but I can't think any would).
 
Last edited:

Corgi

Well-known member
The Game Boy emulator I wrote with my friends a few years ago is only playable on a 600MHz G3, but then it's straight C with no asm. That in itself is an accomplishment, since it is also cycle-accurate. There are a number of hacks if you don't care about "pure" accuracy that should make it go much faster, but I've forgotten most of them in the intervening years 😅
 

rjkucia

Well-known member
I’ve tried GB68k on my SE/30, and it’s a very unpleasant experience. Games are really more slideshows than games.
 

Snial

Well-known member
My 2 pence on this. The GameBoy runs at max 1.04 x 8-bit MIPs, and a Mac Plus is really about max 6.5MHz/4=1.56MIPs because of the number of video cycles taken. Therefore any viable emulator needs to achieve roughly 1.5 68K instructions to each Game Boy instruction to maintain realtime.

Therefore the only way to do that is if you statically compile a GameBoy Cartridge ROM into pure 68K code. There are a few things that help here. The GB CPU is essentially very much like an 8080 or Z80, so it has few registers: A, BC, DE, HL, SP, PC. This is small enough to be entirely emulated in 68K registers, so in theory you can sometimes manage direct emulation.

Secondly, because it's largely an accumulator architecture (at least for 8-bit calculations) with limited ALU combinations; you could theoretically convert multiple GB instructions into a single 68K instruction. e.g.

ld a,c
add e
ld c,a ;(really c+=e)

Can become in 68K:
add.b rC,rE

And this kind of sequence will probably happen quite a bit.

Jump relative, Jmps and calls will translate directly into BRA and BSR instructions; that's certainly a 1:1 ratio.

Multiple shifts or rotates can become single shift/rotates (but I think the GB had hardware assist for multi-bit shifts too).

On the bad side, the GB CPU is little-endian, but the 68K is big-endian. This means that 16-bit operations will often turn into multiple 8-bit operations; or you swap the 68K data.

Also, the GB's CPU will only modify condition codes on a subset of 68K operations. e.g. ld a,15 might be move.b rA,#15, but the latter sets condition code bits. This means that you need to save condition codes explicitly.

The GB's CPU can individually manipulate the upper byte of any register pair, but the 68K finds that difficult, particularly if you want to store a result to the upper byte of a register pair.

But the upshot is that you need a static recompilation emulator. Sorry.
 

Arbee

Well-known member
Yeah, I usually quote people that you want a machine in the range of 20-40 times faster than what you're trying to emulate. I don't think even a static recompiler would get full speed Game Boy on a Plus, but don't let me stop you.

For endianness, a trick we've used in MAME since almost the dawn of time is that memory is always stored in host order. That way 16 bit operations are much cleaner, and 8-bit loads and stores just have to XOR the low bit(s) of the address.
 
Top