• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

restoring 68kmla from google and archive.com web caches

sunder

Well-known member
For those that don't know, when 68kmla went down I thought to grab it off Google's cache and wrote up a couple of scripts to do so. After talking to some folks on PPCMLA, I got a better search string, and ran a second pass that way.

Next, I was able to grab all the pages from archive.org based on the search criteria found here:

http://www.ppcmla.com/forums/viewtopic.php?p=991#991

I've not yet looked much at the HTML, but you can download it here:

http://lisaem.sunder.net/68kmla-archive.tar.bz2

The google cache ones can be found here:

http://lisaem.sunder.net/68kmla-gcache.tar.bz2

Note that the file names contain strange characters such as &'s ?'s and @'s.

So the final step is to write something to extract the posts from the HTML, etc.

I suspect between google and archive.org we should have most of the old posts.

I don't quite know what format messages should be in so that they can be fed back into PHPbb or whatever the current board software is. What would make sense to you guys?

Trouble is that it's not very easy to reverse the special tags such as [ code ] and [ quote ] back from the html into their tags.

Or should I just leave it as HTML pages? Trouble with that is that you won't be able to use do anything but read the old messages.

:b&w:

 

~tl

68kMLA Admin Emeritus
I'll take a look, see what the pages are like and see what I can do with it.

 

iMac600

Well-known member
I've been sorting through them and while they are complete, it looks like we'll need a team to go through all those.

I think I have a way to sort through all of them, but at the moment it's quite slow.

 

Dan 7.1

Well-known member
...this might be the minority viewpoint here, but why exactly do we need to restore all the old data?

a fresh start is a healthy thing for any community. unless you guys just really love to read old threads.

 

iMac600

Well-known member
The same reason we have http://68kmla.org/forums/archive online. We covered thousands of Mac problems and how to repair them on the old forums. They were a great resource to everyone. If we archive them, we can then pull out a few choice ones later for an FAQ, etc.

That and there's a few that need restoring for the heck of it, most notably the "You know you're obsessed when..." and "HATS!" threads (both of which I have located some of already).

 

Flash!

Well-known member
...this might be the minority viewpoint here, but why exactly do we need to restore all the old data?
I tend to agree. Though there is a lot of information there, there is also a lot of junk. If the whole lot can be put online somewhere as html that would probably suffice - I don't really see the point of someone spending however many hours trying to get the data sorted out again (unless of course they really really want to do it, and they don't care if 80% of the MLA don't appreciate their effort).

If a question/problem pops up, I'm sure that we (the MLA) will just post an answer. We've got a new Feets thread running again, I'm sure that we can repeat ourselves on a few other topics too [;)] ]'>

 

~tl

68kMLA Admin Emeritus
Yeah, maybe the best solution would be to archive any useful information that crops up on the forums in some sort of information database. Some sort of wiki type setup would probably be the best. What do you think? It would be a lot easier to find useful information if it was all in one place... plus you wouldn't need to sort through all the crap that gets posted on the forums.

 

Mr. 680x0

Well-known member
We should release a yearly book with all the info from the forum. :D Or a monthly magazine - I know I'd subscribe.

 

Bunsen

Admin-Witchfinder-General
Just wondering who's working on this and how it's going? What kind of assistance would be helpful?

 

Osgeld

Banned
i thought it was archived somewhere around here ... restoration and inclusion with this forum could be easy if the old database is in tact (aside from changing all the post numbers)

 

johnklos

Well-known member
i thought it was archived somewhere around here ... restoration and inclusion with this forum could be easy if the old database is in tact (aside from changing all the post numbers)
Since the database is what is lost, perhaps we need to look at this differently. Rather than try to load old static content into a dynamic web site (which has no real advantage since it's not going to change), we should instead try to massage the links to all be relative, create a static site for all of the archives, and put it up outside of the 68kmla.org site. If I were running 68kmla, I wouldn't want to try to import stuff taken from Google or Archive.org directly into the site.

I'll grab those archives and see how usable they are in their current form. Perhaps between sed and mod_rewrite, they can be put on a static domain and used that way.

John

 

~tl

68kMLA Admin Emeritus
We (well, I) have the archives for the era beginning with the transition from the Snitz forum (30th Dec. 2003) until a backup was made on the 23rd Feb. 2006. The only thing that is actually missing is between then and the server crash on the 5th Apr. 2007. I've been considering merging that database with the current one, but I haven't found an easy tool to do so for phpBB. If anyone has any ideas, I'd be interested to hear them.

As for the topics in the "gap". Sure it may be possible to recover some of them from the archive.org/etc (many have been downloaded already) but sorting them out and getting them back into a usable format is a BIG task. Certainly more than I've been able to take on over the past few years.

 

Osgeld

Banned
you probally wont find a easy tool, the last couple times ive had to do similar I had to write a script

it doesnt really have to be an interactive thing i guess, one could setup the old system and mirror it into static html just for reference (and would be quicker and more reliable than archive.org ... maybe)

 
Top