• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

Public gateway to comp.sys.apple2

Dog Cow

Well-known member
Update: I have 20,300 articles in 2,300 threads from January 2007 up until yesterday (none have been posted today yet). I have no idea what the retention rate is going to be like, since that's already using 80 MB.

In regards to adding newsgroups, I'd like to add the comp.sys.mac.*, except that they're mostly talking about OS X and the like these days, and that's not really interesting to me. I'll see about getting the rest of the comp.sys.apple2.* news groups later today or this weekend.

 

Dog Cow

Well-known member
I don't know. It's on my server, so I like that. It's probably a nicer interface, and I'll be working on making it fit into my main site. There's a load of options for searching. You don't have Google tracking and spying on you.

 

luddite

Host of RetroChallenge
It's probably a nicer interface
Certainly more lynx-friendly :)

My only quibble would be that it defaults to proportional font... not a big deal, but I think fixed-width would make sense considering all the pinout diagrams and assembly listings etc... and it just looks more awesomely old school ;-)

Good job!

 

Dog Cow

Well-known member
It's probably a nicer interface
Certainly more lynx-friendly :)
I don't know how long it will stay that way, since I'm working on integrating it into my main site this weekend. :-/ I'll keep the text-only browser in mind, though.

My only quibble would be that it defaults to proportional font... not a big deal, but I think fixed-width would make sense considering all the pinout diagrams and assembly listings etc... and it just looks more awesomely old school ;-)
I'll change that.

 

Dog Cow

Well-known member
Unfortunately, most of those you have listed are Google groups, not Usenet newsgroups. :( I found a 68k linux Usenet group, but it was full of spam. :O

 

Dog Cow

Well-known member
I spent quite a bit of time last night and this morning improving the Usenet post-indexing system. Before, it wasn't always grouping posts into their correct thread, and instead was creating new threads. What a nightmare! All I can say is that the system of grouping posts into threads can be best described with the phrase "seat of your pants."

Anyway, the whole system is running pretty smoothly with just a few little last things I want to fix up.

Also, I need to stop constantly rebuilding the index from scratch. :O It looks like the server I'm connecting to has a retention rate of about 2 years--- just a few days ago, it supplied posts from Jan 2007 and I had 20,000+ posts in total. Today I have only 19,700 reaching to only Feb 2007. I have a backup of the old database, but I'm not sure if it's worth the hassle to merge those last few posts in. :-/

 

Dog Cow

Well-known member
Well, I decided to import the "lost" posts last night. At first, I wasn't going to bother because I thought it was only 300 or less posts, but after I ran an SQL query to compare the database from its state on the 3rd of May to the state on the 5th, I realized there were in fact over 1,000 posts which were not in the newer database. This was because I had been deleting the database and starting over for fixing bugs in the Usenet reader software I'm using. The thing I didn't realize until the other day was that the Usenet server was slowly, silently, expiring the oldest posts from early 2007 as the days went on. Thus, each successive total rebuild meant that those old posts would not be retained. I noticed this troubling occurrence, and decided to fix it by restoring a copy of the posts I had from 28 April 2009.

I didn't want to merge in the old posts "by hand," so naturally, I decided to overkill the problem by writing a few scripts to do it for me. On the 28th of April I used Unison to download all the posts from the comp.sys.apple2 group which were on the NNTP server. This resulted in a 49.5 MB file of all posts joined together, with little bits of binary garbage in between each post, probably due to server or software error.

In any case, I needed a good way to split out those posts-- over 20,000 of them-- into individual files. Enter PHP: a general-purpose scripting language which I know only too well. In less than 5 minutes, I wrote a 40-line PHP script to split out the posts into separate files. After a few more minutes of debugging, I ran the script. The Mac mini's fan sped up as 21,051 individual files were generated. These files represented each Usenet post by itself, stripped of the binary garbage.

Next step? Import all 21,051 text files back in to the database. This called for another PHP script of similar length. A loop opened each text file, extracted the post contents, then handed it off to the Usenet reader software to parse and place into the database.

A few minutes later, and presto, all the posts from Jan 2007 up to 28 April 2009 were in the database! I had to spend some time fixing some small errors: some posts were bad and so they had no subject or body, or the header didn't contain a date and so defaulted to the Unix epoch. I deleted the posts with missing subjects/message and attempted to guess the correct dates for the rest by referencing them in respect to their overall thread. One good thing that came out of this is that I was using the binary garbage to detect the end of each post, so the spam posts with multi-byte rubbish were completely cut out.

The end result is that I now have all the Usenet posts which the server had been expiring these past few days, and now I can continue my little archive, resting assured that I have rescued all of the earliest posts. :)

 
Top