To reiterate again, yes, the NuBus bandwidth doesn't count here, it's the bandwidth of the dedicated memory on the video card itself that does. Many late 386/early 486 machines had 1MB video cards capable of 1024x768x8 8514/a resolutions stranded on an ISA bus only capable of doing 4-5mb/sec with the aid of a stiff tailwind. With a good 2D accelerator those machines could handle tasks like word processing acceptably but trying to display video streams at anything more than postage-stamp size turned things into a slideshow. Even Macs that used main memory for video refresh (the original black-and-white toasters, the Macintosh IIsi/ci, the Power Macintosh 6100/7100, etc.) don't waste the CPU's time stuffing bytes it into the output DAC. The video hardware has its own address counters and "steals the bus" as needed to grab bytes from RAM on its own. (see below.)
Dual-ported RAM is one oldschool solution. Usually it has a parallel (8 bit, for example) port on one side, and a serial port on the other. The CPU can address and alter bits/bytes randomly from the parallel side, while the RAM is also outputting an appropriately clocked serial stream to the video circuitry on the serial side.
Actually, most dual-ported RAM isn't "serial" on the output side (that would make it difficult to use the same RAM chip for different video geometries), but it wasn't unusual for the output port to have an increment-on-read address generator so successive words could be read "in a serial fashion" without having to feed an address for every cycle. Most inexpensive video systems just used normal SRAM or DRAM and would either time multiplex it (this was extremely common on 6502-based machines, where the CPU only used the address/data bus every other clock cycle and thus allowed another device to share the same memory pretty easily) or just force the CPU to wait until the end of a scanline (or even until the vertical blanking interval) if it tried to access RAM at the same time as the video hardware.
(In some very old systems, like the TRS-80 Model I or the original IBM CGA card, the CPU actually has priority and if it writes to the video area during screen generation it will cause a dropout, described as "snow" or "static", in the resulting picture.
BTW, direct video generation from micros is totally possible.
Even a lowly PIC-18. Admittedly, that's still more powerful than a 128k Mac.
"Hybrid" systems where the video timing itself was generated in hardware but the video data was fed to it by the CPU during every refresh weren't terribly uncommon in the early years of computing. (For instance, the Sinclair ZX-80 and the Atari 2600 both do this to differing degrees.)
But... yeah, in any case, back to the original topic: Rolling your own video output is probably the hardest part of making a homebrew computer, full stop. If you're going to try to build this yourself the first thing you're going to have to do is define what your target is. You mention the Apple II; that machine in its original form only used 7k for its framebuffer in high-res mode. (it's really basically a monochrome machine that uses some ugly tricks to "colorize" the output, which gives the ][ its distinctive messy graphics display.) Refreshing that display 60 times per second requires less than 500Kp/s of bandwidth, which coincidentally can essentially be had for "free" when a 1Mhz clocked memory is paired up with a 6502 CPU. Clearly you're hoping for more than that but be realistic: what do you actually want/NEED to achieve? Are you trying to make a simple homebrew programming/interfacing machine, or are you aiming at a roll-your-own graphics workstation? And if it's the latter are you talking mid-1980's quality graphics or something that wouldn't get laughed off the stage today?
Most of the "simple" homebrew systems you see out there don't bother with video at all. They either use a serial text console or they go one step further and interface an MCU like an AVR, PIC, or Propeller which can reasonably easily do video output with minimal hardware and use that like a built-in terminal. (From a logical standpoint the difference is slight.) Doing something like that does *not* give you easy access to any graphics capability. Your video device will be interfaced as one or more simple 8 bit ports, which means if you do try to work any graphics into the design any pixel setting or line drawing you do from the main CPU will have to be executed as a series of commands shoved byte-by-byte into the slave processor.
(Technically you can still manage some "pretty good" graphics that way. The TMS 9918A graphics chip used in machines like the ColecoVision game console and the Japanese MSX computers didn't allow direct memory access to its "private" frame buffer from the main CPU. Of the super-cheap ways to do video the Propeller is probably the most capable as it can, with almost no hardware, manage super-VGA-sized monitors with a palette of 64 colors, but it only has 32k of RAM, some of which will be required for code, and is thus mostly limited to "tile-based" video displays. You can't really do a full-screen Mac-style bitmap at anything but fairly low resolutions. Best you could manage would be something around 512x384 monochrome, much lower in color. Even a 320x200 8 bit VGA screen requires 63k of RAM, and it's *not trivial* to use external RAM on the Propeller for anything, let alone video generation. But if you're happy with 80's game-console-quality graphics and TTY text you could do worse.)
The next option would be to interface a DAC and use the main CPU to shove bytes under it under software control. Those bandwidth numbers you tossed out there don't sound that intimidating for a 70mhz ARM, honestly. For reasons that are difficult to explain (and in some cases over my head) an ARM CPU is probably *not* a good choice to try to "software emulate" the entire video stream, but... glancing at the datasheet for part you named maybe you could do it with some of the GPIO pins. Alternatively you could use external circuitry (like maybe a Propeller?) to generate the video timings and have that circuitry fire off an interrupt at the start of each scanline to tell the main CPU to start fetching bytes from the framebuffer and shoving them into a FIFO leading to a DAC where they'll be combined with the timing signals and sent to the monitor. I imagine this would be totally doable... if somewhat inefficient (Your CPU will be occupied roughly half the time shovelling bits from RAM out the door) and something of a nightmare to program.
The final step up would be to build your own DMA video system. As described by commodorejohn a simple monochrome system *could* consist of not much more than a handful of logic, a few crystals, and some analog voodoo. The analog voodoo is the hard part, and the fact that you're looking at a CPU with a 70+ mhz clock speed means this will be a pretty intimidating homebuild. (Electronics start getting "hard" at double-digit Mhz speeds. Something like the original Mac's 512x342 resolution you could get away with wire-wrapping, but even 640x480x256 requires pushing data about *20 times faster*.) This would get *really* complicated if you wanted to use SDRAM controlled by the on-chip RAM controller; your homebuilt "video card" would have to operate synchronously with it and essentially have to "bus master" and do arbitration at the full speed of SDRAM, which again, is going to be *really fast* by homebrew standards. Your life would probably be easier if you went with a dedicated framebuffer, but even then you're looking at building a high-speed VGA card from scratch.
Frankly if big colorful bitmaps are your priority I'd say dump your current target chip and pick an ARM SoC that has a VGA or LCD controller onboard. They're a dime a dozen and solve the whole problem. (Heck, you're not even stuck with analog VGA; you can get HDMI easily enough.) But... if you're going to do this, why not just pay $30 for a Raspberry Pi and call it a day?
Out of curiosity, have you built a homebrew computer before or would this be your first?