Well, I decided to import the "lost" posts last night. At first, I wasn't going to bother because I thought it was only 300 or less posts, but after I ran an SQL query to compare the database from its state on the 3rd of May to the state on the 5th, I realized there were in fact over 1,000 posts which were not in the newer database. This was because I had been deleting the database and starting over for fixing bugs in the Usenet reader software I'm using. The thing I didn't realize until the other day was that the Usenet server was slowly, silently, expiring the oldest posts from early 2007 as the days went on. Thus, each successive total rebuild meant that those old posts would not be retained. I noticed this troubling occurrence, and decided to fix it by restoring a copy of the posts I had from 28 April 2009.
I didn't want to merge in the old posts "by hand," so naturally, I decided to overkill the problem by writing a few scripts to do it for me. On the 28th of April I used Unison to download all the posts from the comp.sys.apple2 group which were on the NNTP server. This resulted in a 49.5 MB file of all posts joined together, with little bits of binary garbage in between each post, probably due to server or software error.
In any case, I needed a good way to split out those posts-- over 20,000 of them-- into individual files. Enter PHP: a general-purpose scripting language which I know only too well. In less than 5 minutes, I wrote a 40-line PHP script to split out the posts into separate files. After a few more minutes of debugging, I ran the script. The Mac mini's fan sped up as 21,051 individual files were generated. These files represented each Usenet post by itself, stripped of the binary garbage.
Next step? Import all 21,051 text files back in to the database. This called for another PHP script of similar length. A loop opened each text file, extracted the post contents, then handed it off to the Usenet reader software to parse and place into the database.
A few minutes later, and presto, all the posts from Jan 2007 up to 28 April 2009 were in the database! I had to spend some time fixing some small errors: some posts were bad and so they had no subject or body, or the header didn't contain a date and so defaulted to the Unix epoch. I deleted the posts with missing subjects/message and attempted to guess the correct dates for the rest by referencing them in respect to their overall thread. One good thing that came out of this is that I was using the binary garbage to detect the end of each post, so the spam posts with multi-byte rubbish were completely cut out.
The end result is that I now have all the Usenet posts which the server had been expiring these past few days, and now I can continue my little archive, resting assured that I have rescued all of the earliest posts.