sunder
Well-known member
For those that don't know, when 68kmla went down I thought to grab it off Google's cache and wrote up a couple of scripts to do so. After talking to some folks on PPCMLA, I got a better search string, and ran a second pass that way.
Next, I was able to grab all the pages from archive.org based on the search criteria found here:
http://www.ppcmla.com/forums/viewtopic.php?p=991#991
I've not yet looked much at the HTML, but you can download it here:
http://lisaem.sunder.net/68kmla-archive.tar.bz2
The google cache ones can be found here:
http://lisaem.sunder.net/68kmla-gcache.tar.bz2
Note that the file names contain strange characters such as &'s ?'s and @'s.
So the final step is to write something to extract the posts from the HTML, etc.
I suspect between google and archive.org we should have most of the old posts.
I don't quite know what format messages should be in so that they can be fed back into PHPbb or whatever the current board software is. What would make sense to you guys?
Trouble is that it's not very easy to reverse the special tags such as [ code ] and [ quote ] back from the html into their tags.
Or should I just leave it as HTML pages? Trouble with that is that you won't be able to use do anything but read the old messages.
:b&w:
Next, I was able to grab all the pages from archive.org based on the search criteria found here:
http://www.ppcmla.com/forums/viewtopic.php?p=991#991
I've not yet looked much at the HTML, but you can download it here:
http://lisaem.sunder.net/68kmla-archive.tar.bz2
The google cache ones can be found here:
http://lisaem.sunder.net/68kmla-gcache.tar.bz2
Note that the file names contain strange characters such as &'s ?'s and @'s.
So the final step is to write something to extract the posts from the HTML, etc.
I suspect between google and archive.org we should have most of the old posts.
I don't quite know what format messages should be in so that they can be fed back into PHPbb or whatever the current board software is. What would make sense to you guys?
Trouble is that it's not very easy to reverse the special tags such as [ code ] and [ quote ] back from the html into their tags.
Or should I just leave it as HTML pages? Trouble with that is that you won't be able to use do anything but read the old messages.
:b&w: