• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

What's your recollection of Mac Plus offscreen CopyBits refresh rate?

Mu0n

Well-known member
Context: I'm trying to refresh a Mac Plus' whole screen 512x342 (for starters..., I will reduce that size going forward) by installing a VBL Task whose purpose is to CopyBits the contents of an offscreen bitmap to the screen bits of what is shown. Of course, the best way to do this is to limit the rectangle size of data to be copied over to be as small as possible and only refresh what's truly moving - I know about that. I also know that the routine gets faster if you use a rectangle width that's a power of 2. However, I want to see if it's feasible to get it smooth first in the worst case, then assess how small I have to go down to.

Without divulging too much (I want to bring this app as a surprise to the community), the graphics that are moving are not simple predictable shapes, but at least the background they'll be moving over is going to be very straightforward (either all black, or all white). The moving graphics is going to be too numerous and too spread out to go with the "repair current state, then paste in new state" of these objects - my intuition is just to paste over the whole section with the new state.

What I'm asking: in your mind, is it already impossible to do this fluidly for a full screen?

If it's not possible, do you have a sense of how large this can happen every vbl cycle?

Can I get a compromise and wait a few refresh cycles (each 16.67 ms) before drawing the new state?

 

Crutch

Well-known member
CopyBits is a complicated routine that does many wonderful things.  It’s optimized in some respects to support power-of-2 widths, long-word-aligned bitmaps etc., but it’s not a speed demon.  It also adds overhead going through the trap dispatcher and may move or purge memory (see below).  In almost every case, where you’re not worrying about multiple devices and don’t care about fancy clipping regions, you will get much faster results if you blit the bits yourself, preferably from a tight loop written in assembly.

Also, you should not be calling CopyBits from a VBL task.  CopyBits is on the list of “Routines that may Move or Purge Memory” in Inside Macintosh and therefore shouldn’t be called at interrupt time.  (If you must use CopyBits you can set a flag in your VBL task and have your application’s main event loop check the flag and call CopyBits itself when appropriate.)  If you roll your own bit blit routine by writing directly to the screen, you won’t have this problem, which is another reason to do it in your case.

The easiest way to do this is to ensure your source and destination rects are aligned on long-word boundaries and do something like this:  (written from top of my head, not tested, converting to 68k assembly strongly recommended to ensure those ‘register’ vars really live in registers!)

Code:
/* copy a bitmap with no multiplication */
/* assumes srcRect and dstRect are same size with left & right edges on long-word boundaries */
/* assumes src is a whole bitmap (so srcRect implies rowBytes) that's smaller than dest */
/* assumes writing to a 512-pixel-wide screenBits that's 1 bit deep */

register long *src = ... /* start of your source bitmap */

/* add left/32 and top*(512/32) */
register long *dest = ((long *) screenBits.baseAddr) + (dstRect.left >> 5) + (dstRect.top << 4);

const register short numColumns = (srcRect.right - srcRect.left) >> 5;  // divide by 32 to get # longs
const register short numRows = srcRect.bottom - srcRect.top;

for (register short i = 0; i < numRows; i++)
{
  for (register short j = 0; j < numColumns; j++)
    *(src++) = *(dest++);
  
  dest += 16;  // 512/32 == 16 ... move to next scan line
}
 

Crutch

Well-known member
(p.s. you can’t call BlockMove in a VBL task, either)

Also, looking forward to seeing what you come up with!

 
Last edited by a moderator:

Crutch

Well-known member
super obvious typo above sorry - the operative line should of course be

Code:
[COLOR=#666600]*([/COLOR][COLOR=#000000]dest[/COLOR][COLOR=#666600]++)[/COLOR][COLOR=#000000] [/COLOR][COLOR=#666600]=[/COLOR][COLOR=#000000] [/COLOR][COLOR=#666600]*([/COLOR][COLOR=#000000]src[/COLOR][COLOR=#666600]++);[/COLOR]
 

Mu0n

Well-known member
Thanks for those tips. I more or less walked through the same topic about 15 years ago.

Measuring ticks timings (with TickCount because I'm limiting myself to System 6) made this obvious because 2 CopyBits operations were ramping up in the 27 or so Ticks range (each 1/60th of a second). Taking so long will of course not remove any tearing or flickering under most environments.

I also remembered that Creepy Castle has near full screen extremely smooth scrolling, making it super obvious that even CopyBits, even with multiples of 4 bytes of width is not the way to go.

I'm not seeing that I can use register variables under Symantec C++ 6. I'll try with regular longs.

 

Mu0n

Well-known member
/* add left/32 and top*(512/32) */
register long *dest = ((long *) screenBits.baseAddr) + (dstRect.left >> 5) + (dstRect.top << 4);




I'm getting crashes during the looping copy, as is standard C pointer fare :)

I've tried long and hard and I can't wrap my mind around these additions. Why wouldn't the destination pointer address just be at the baseAddr?

 

Crutch

Well-known member
/* add left/32 and top*(512/32) */
register long *dest = ((long *) screenBits.baseAddr) + (dstRect.left >> 5) + (dstRect.top << 4);




I'm getting crashes during the looping copy, as is standard C pointer fare :)

I've tried long and hard and I can't wrap my mind around these additions. Why wouldn't the destination pointer address just be at the baseAddr?


I can check my code later - I typed that out during my morning coffee so errors extremely probable!

The additions are to get the starting point to modify the destination bitmap.  If you’re doing the whole screen, dstRect.left == dstRect.top == 0 and so you are indeed starting at the baseAddr.  I wasn’t sure of your exact use case. My code is intended to blit ALL of a smaller source bitmap into a PART of a larger destination bitmap, into the rect given by dstRect.

The additions work by taking baseAddr (in (long*) terms, i.e. in 32-bit chunks), adding dstRect.top*16 (since there are 16 longs per 512-bit row), then adding dstRect.left/32 (to get the number of longs to skip to get to the left edge of the target part of the destination bitmap).

 
Last edited by a moderator:

Mu0n

Well-known member
Wait, don't these bitmaps work such as baseAddr points to the first int or long (however we want to parse it) associated with the top left area and progresses towards the bottom right in the normal English reading direction? Your code assumes the baseAddr is the very last one of the bitmap and you have to backtrack to the beginning stack style (or was it heap style?) . I was sure it wasn't that but I'll second guess myself and seek this out to verify. 

 
Last edited by a moderator:

Crutch

Well-known member
Wait, don't these bitmaps work such as baseAddr points to the first int or long (however we want to parse it) associated with the top left area and progresses towards the bottom right in the normal English reading direction? Your code assumes the baseAddr is the very last one of the bitmap and you have to backtrack to the beginning stack style (or was it heap style?) . I was sure it wasn't that but I'll second guess myself and seek this out to verify. 


No, I start with baseAddr and add to it based on the number of rows and columns from which we want to skip to get to the starting point, just as you suggest.  What makes you think I am backtracking?

I did find another typo though sorry, that’s what I get for writing code from memory at 6am.  This should work, I tested it, I changed the last line:

Code:
/* copy a bitmap with no multiplication */
/* assumes srcRect and dstRect are same size with left & right edges on long-word boundaries */
/* assumes src is a whole bitmap (so srcRect implies rowBytes) that's smaller than dest */
/* assumes writing to a 512-pixel-wide screenBits that's 1 bit deep */

register long *src = ... /* start of your source bitmap */

/* add left/32 and top*(512/32) */
register long *dest = ((long *) screenBits.baseAddr) + (dstRect.left >> 5) + (dstRect.top << 4);

const register short numColumns = (srcRect.right - srcRect.left) >> 5;  // divide by 32 to get # longs
const register short numRows = srcRect.bottom - srcRect.top;

const short stride = 16 - numColumns;  // 512/32 == 16 ... move to next scan line

for (register short i = 0; i < numRows; i++)
{
  for (register short j = 0; j < numColumns; j++)
    *(dest++) = *(src++);
  
  dest += stride;
}
 
Last edited by a moderator:

Mu0n

Well-known member
Today, I can finally make the C-style manual copy work on both mini-vMac and a regular ol' hardware Mac Plus with satisfying fluidity - way bigger than I had with CopyBits. The GrafPort in BasII on my Win10 machine is stubborn and shows weird results and fills the top few lines with black, although it no longer crashes anymore. The emulator's width resolution, happens to be 1536 pixels and I suspect I must do special bigger jumps from line to line. Back to the debugger and my trusty SpeedCrunch calculator which switches from decimal to hex on the fly!
 

 

Mu0n

Well-known member
I've figured out (sometimes remembered) stuff: 

-The color mode the Monitor is set to matters in terms of pixel depth and how much data is dedicated to it in System 7 when you attempt to do this direct data manipulation. Everything became easier when I set it to 1-bit B&W

-I don't *need* to develop for color System 7 but it makes development so much faster to do it at least temporarily

-Apple discourages direct tampering of the GrafPort for this reason probably, but we have righteousness and hindsight on our side!

-The stride calculation didn't work as a short when I was working in a 1500 pixel (and later 1472, a power of 32) wide environment and fetching my mainWindowPtr->portBits.bounds.right bit shifted 5 times to the right (divided by 32)

I'll keep hammering away at this reduced 512x342 version before I extend compatibility.

 
Top