Thanks guys! I agree, just the time savings from not having to screw around with the PLCC chips will be worth it.
I thought there was some kind of permanent unlock command you could send, which is different from the byte-by-byte unlock, no?
This is a thing that is different for every chip -- I've been doing a lot of looking at Flash memory datasheets to try to understand how the commands work. The chips that I'm using only have a byte program mode. The datasheet claims that once the single byte write operation has completed, the chip will go back into read mode. I haven't tried writing another byte without the unlock sequence, but I suspect it will fail. I'll try it just to be sure, though.
Edit: Just tried, doesn't work

You have to do a 3-byte sequence before every single byte written.
Some chips will let you completely enable/disable the software data protection whenever you want. With those chips, it would be possible to send a command to totally disable protection, then write tons of bytes, then re-enable the protection. On the chips I'm using, software data protection is always enabled. [Except on the smaller chips I was using, but the bigger chips are more useful for hackery]
Also, some chips, called page-write, will let you send an unlock command followed by a complete page (128 bytes, for example) of data quickly, just as fast as read cycles, into a page buffer. Then once you've finished sending the page (or smaller than a page), the actual write operation will happen which has a 5 microsecond (for example) delay. Unfortunately, the chips I'm using are not page-write chips [the smaller chips I was using are, though]
If the overhead of checking the busy flag via SPI is too high, can you just busy wait a fixed amount of time, which is longer than the worst-case ROM programming time from the data sheet? Then proceed without checking the busy flag.
That's true -- the worst case time, they claim, is 20 microseconds, with a typical time of 14 microseconds. I'm thinking because of the length of that time, I may not be losing much from polling anyway. I will go ahead and try it though, to see what kind of performance I get. Good idea!
Edit: This may not work too well, actually -- the other chip I'm using has a worst case time of 300 microseconds per byte. It seems to get slower as it ages. I think I'm probably better off polling...
Edit 2: If not for that crazy 300 microsecond worst case time, this would actually help a ton. With an 18 us delay (because I know the initial SPI setup will take up the other 2 us), the programming takes 1 minute and 11 seconds. Maybe I should look into optimizing my SPI read/write functions as much as possible...
DQ, how much did your ROM Programmer cost?
What's your target price for the Jolly Roger SIMM/four-at-a-time-ROM Programmer?
The programmer I'm using was like $60...
Because of the low PCB quantity, the PCB itself is one of the most expensive parts -- I'm talking $6 each when we factor in the shipping I paid. That will go down as I get a bigger order of them depending on how many people are interested. The rest of the parts total around $15 per unit and that could go down as I buy some of the more expensive parts (the ICs) in bigger quantities. Plus, assembly time is longer with these boards because it feels like there are a bazillion different values of resistors and capacitors that I have to keep organized and put in the right place on the PCB, so I hope everyone understands that I'm not just selling these for the price of the parts
I'm probably going to be looking at a price of around $35 plus shipping for the board, and the SIMMs will still be $16 for the SIMM, +$4 for the set of 4 chips = $20 (thanks to trag, I have been able to lower the price of the ROM chips)
Edit again: At this level, every little optimization counts. Setting two flags with one instruction rather than one flag at a time means a difference of 15 seconds in programming time. Maybe I can write my SPI routine in assembly ;-)