restoring 68kmla from google and archive.com web caches

What's going down in the barracks.

restoring 68kmla from google and archive.com web caches

Postby sunder » 04 May 2007, 17:44

For those that don't know, when 68kmla went down I thought to grab it off Google's cache and wrote up a couple of scripts to do so. After talking to some folks on PPCMLA, I got a better search string, and ran a second pass that way.

Next, I was able to grab all the pages from archive.org based on the search criteria found here:

http://www.ppcmla.com/forums/viewtopic.php?p=991#991


I've not yet looked much at the HTML, but you can download it here:

http://lisaem.sunder.net/68kmla-archive.tar.bz2

The google cache ones can be found here:

http://lisaem.sunder.net/68kmla-gcache.tar.bz2

Note that the file names contain strange characters such as &'s ?'s and @'s.

So the final step is to write something to extract the posts from the HTML, etc.

I suspect between google and archive.org we should have most of the old posts.

I don't quite know what format messages should be in so that they can be fed back into PHPbb or whatever the current board software is. What would make sense to you guys?

Trouble is that it's not very easy to reverse the special tags such as [ code ] and [ quote ] back from the html into their tags.

Or should I just leave it as HTML pages? Trouble with that is that you won't be able to use do anything but read the old messages.

:b&w:
I will continue to decipher, And make it easier for you, Behold! Let it be told, What is fiction and untrue

I will not leave you behind, With the burdens of belief, And divine misery, That means restless grief

For it is written, The times of old will come anew, And have the universe reversed to its origin, When the ancient winds blew
User avatar
sunder
 
Joined: 02 May 2007, 00:52
Location: Somewhere in time...

Re: restoring 68kmla from google and archive.com web caches

Postby ~tl » 04 May 2007, 18:08

I'll take a look, see what the pages are like and see what I can do with it.
~tl
 
Joined: 01 May 2007, 20:51

Re: restoring 68kmla from google and archive.com web caches

Postby iMac600 » 10 May 2007, 16:43

I've been sorting through them and while they are complete, it looks like we'll need a team to go through all those.

I think I have a way to sort through all of them, but at the moment it's quite slow.
iMac600
 
Joined: 02 May 2007, 10:59

Re: restoring 68kmla from google and archive.com web caches

Postby Dan 7.1 » 10 May 2007, 17:50

...this might be the minority viewpoint here, but why exactly do we need to restore all the old data?

a fresh start is a healthy thing for any community. unless you guys just really love to read old threads.
The world is much more interesting over there.
User avatar
Dan 7.1
 
Joined: 02 May 2007, 01:00
Location: Knoxville, TN

Re: restoring 68kmla from google and archive.com web caches

Postby iMac600 » 10 May 2007, 17:53

The same reason we have http://68kmla.org/forums/archive online. We covered thousands of Mac problems and how to repair them on the old forums. They were a great resource to everyone. If we archive them, we can then pull out a few choice ones later for an FAQ, etc.

That and there's a few that need restoring for the heck of it, most notably the "You know you're obsessed when..." and "HATS!" threads (both of which I have located some of already).
iMac600
 
Joined: 02 May 2007, 10:59

Re: restoring 68kmla from google and archive.com web caches

Postby gobabushka » 10 May 2007, 18:37

hey, it wont let me download the archive.org file from your server. im getting a 403 error
"I reject your reality and substute my own!!!"
User avatar
gobabushka
 
Joined: 02 May 2007, 12:47
Location: Palm Bay, Florida

Re: restoring 68kmla from google and archive.com web caches

Postby sunder » 16 May 2007, 00:17

hey, it wont let me download the archive.org file from your server. im getting a 403 error


Oops! Sorry about that. s/b good now.
I will continue to decipher, And make it easier for you, Behold! Let it be told, What is fiction and untrue

I will not leave you behind, With the burdens of belief, And divine misery, That means restless grief

For it is written, The times of old will come anew, And have the universe reversed to its origin, When the ancient winds blew
User avatar
sunder
 
Joined: 02 May 2007, 00:52
Location: Somewhere in time...

Re: restoring 68kmla from google and archive.com web caches

Postby Bunsen » 04 Jun 2007, 01:40

*bump*

Any news, guys?
have you searched? Seeks: Nubus PDS DSP PB170 Newton; TRS-80 III/4; CBM BBC SX-64 CX5M Likes: 8bit luggable palmtop terminal NC tablet audio MIDI analog FM drum synth steam&dieselpunk; 1930-1980 lab/comm/mil Score! NC100 PB190 Q950 IIe-PDS
User avatar
Bunsen
Witchfinder-General
 
Joined: 02 May 2007, 15:59
Location: Melbourne, Australia

Re: restoring 68kmla from google and archive.com web caches

Postby funkytoad » 04 Jun 2007, 01:55

Well, we also have these:
This is a whole link [http://web.archive.org/web/*/68kmla.org]

I love the way back machine, great tool!
User avatar
funkytoad
 
Joined: 05 May 2007, 16:22
Location: Washington State, United States

Re: restoring 68kmla from google and archive.com web caches

Postby Bunsen » 19 Jun 2007, 14:56

sunder in the original post wrote:Next, I was able to grab all the pages from archive.org
have you searched? Seeks: Nubus PDS DSP PB170 Newton; TRS-80 III/4; CBM BBC SX-64 CX5M Likes: 8bit luggable palmtop terminal NC tablet audio MIDI analog FM drum synth steam&dieselpunk; 1930-1980 lab/comm/mil Score! NC100 PB190 Q950 IIe-PDS
User avatar
Bunsen
Witchfinder-General
 
Joined: 02 May 2007, 15:59
Location: Melbourne, Australia

Re: restoring 68kmla from google and archive.com web caches

Postby Flash! » 20 Jun 2007, 10:06

Dan 7.1 wrote:...this might be the minority viewpoint here, but why exactly do we need to restore all the old data?


I tend to agree. Though there is a lot of information there, there is also a lot of junk. If the whole lot can be put online somewhere as html that would probably suffice - I don't really see the point of someone spending however many hours trying to get the data sorted out again (unless of course they really really want to do it, and they don't care if 80% of the MLA don't appreciate their effort).

If a question/problem pops up, I'm sure that we (the MLA) will just post an answer. We've got a new Feets thread running again, I'm sure that we can repeat ourselves on a few other topics too [;)]
User avatar
Flash!
 
Joined: 02 May 2007, 06:27
Location: Sydney

Re: restoring 68kmla from google and archive.com web caches

Postby ~tl » 20 Jun 2007, 12:21

Yeah, maybe the best solution would be to archive any useful information that crops up on the forums in some sort of information database. Some sort of wiki type setup would probably be the best. What do you think? It would be a lot easier to find useful information if it was all in one place... plus you wouldn't need to sort through all the crap that gets posted on the forums.
~tl
 
Joined: 01 May 2007, 20:51

Re: restoring 68kmla from google and archive.com web caches

Postby Mr. 680x0 » 20 Jun 2007, 13:27

We should release a yearly book with all the info from the forum. :D Or a monthly magazine - I know I'd subscribe.
------
User avatar
Mr. 680x0
 
Joined: 02 May 2007, 11:05
Location: New Hampshire

Re: restoring 68kmla from google and archive.com web caches

Postby Bunsen » 31 Jul 2007, 02:14

Just wondering who's working on this and how it's going? What kind of assistance would be helpful?
have you searched? Seeks: Nubus PDS DSP PB170 Newton; TRS-80 III/4; CBM BBC SX-64 CX5M Likes: 8bit luggable palmtop terminal NC tablet audio MIDI analog FM drum synth steam&dieselpunk; 1930-1980 lab/comm/mil Score! NC100 PB190 Q950 IIe-PDS
User avatar
Bunsen
Witchfinder-General
 
Joined: 02 May 2007, 15:59
Location: Melbourne, Australia

Re: restoring 68kmla from google and archive.com web caches

Postby Mars478 » 28 Jan 2010, 16:22

Any headway on this topic? I know this is almost 2 years old. :)
iOwn: MBP 2.8GZ, iMac G5 1.6 , iBook 1GHZ, eMac700, iMac350, iMac233, B&WG3,400, BWG3,350, iBook466Grayx2, iBook300blue, Performa6214CD, Mac512k, MacClassic2, MacIIcix2, PM8100/110, PowerBook150, Apple][e, And lots of iPhones. User:Mars478
User avatar
Mars478
 
Joined: 06 Jun 2009, 15:10

Re: restoring 68kmla from google and archive.com web caches

Postby Osgeld » 28 Jan 2010, 16:39

i thought it was archived somewhere around here ... restoration and inclusion with this forum could be easy if the old database is in tact (aside from changing all the post numbers)
Osgeld
Banned
 
Joined: 19 Jun 2009, 23:06

Re: restoring 68kmla from google and archive.com web caches

Postby johnklos » 28 Jan 2010, 20:33

Osgeld wrote:i thought it was archived somewhere around here ... restoration and inclusion with this forum could be easy if the old database is in tact (aside from changing all the post numbers)

Since the database is what is lost, perhaps we need to look at this differently. Rather than try to load old static content into a dynamic web site (which has no real advantage since it's not going to change), we should instead try to massage the links to all be relative, create a static site for all of the archives, and put it up outside of the 68kmla.org site. If I were running 68kmla, I wouldn't want to try to import stuff taken from Google or Archive.org directly into the site.

I'll grab those archives and see how usable they are in their current form. Perhaps between sed and mod_rewrite, they can be put on a static domain and used that way.

John
johnklos
 
Joined: 19 Oct 2008, 19:22
Location: Earth

Re: restoring 68kmla from google and archive.com web caches

Postby ~tl » 28 Jan 2010, 20:56

We (well, I) have the archives for the era beginning with the transition from the Snitz forum (30th Dec. 2003) until a backup was made on the 23rd Feb. 2006. The only thing that is actually missing is between then and the server crash on the 5th Apr. 2007. I've been considering merging that database with the current one, but I haven't found an easy tool to do so for phpBB. If anyone has any ideas, I'd be interested to hear them.

As for the topics in the "gap". Sure it may be possible to recover some of them from the archive.org/etc (many have been downloaded already) but sorting them out and getting them back into a usable format is a BIG task. Certainly more than I've been able to take on over the past few years.
~tl
 
Joined: 01 May 2007, 20:51

Re: restoring 68kmla from google and archive.com web caches

Postby Osgeld » 28 Jan 2010, 21:00

you probally wont find a easy tool, the last couple times ive had to do similar I had to write a script

it doesnt really have to be an interactive thing i guess, one could setup the old system and mirror it into static html just for reference (and would be quicker and more reliable than archive.org ... maybe)
Osgeld
Banned
 
Joined: 19 Jun 2009, 23:06

Re: restoring 68kmla from google and archive.com web caches

Postby Dog Cow » 28 Jan 2010, 21:32

I've written a script which spiders phpBB forums and extracts all posts, users, and forums into database tables. It even does a so-so job at converting the HTML back into BBCode. It then has a front-end for the database, and it can be searched too using MySQL Fulltext index.
Mac GUI Vault - A source for retro Apple II and Macintosh computing.
http://macgui.com/vault/
User avatar
Dog Cow
 
Joined: 05 Sep 2008, 00:51

Re: restoring 68kmla from google and archive.com web caches

Postby johnklos » 28 Jan 2010, 21:46

Dog Cow wrote:I've written a script which spiders phpBB forums and extracts all posts, users, and forums into database tables. It even does a so-so job at converting the HTML back into BBCode. It then has a front-end for the database, and it can be searched too using MySQL Fulltext index.

Would you like to run that into a database on one of my machines so we can see about creating something off-site, and if it looks good and clean, we can see about how to get it into 68kmla?

John
johnklos
 
Joined: 19 Oct 2008, 19:22
Location: Earth

Re: restoring 68kmla from google and archive.com web caches

Postby Dog Cow » 28 Jan 2010, 23:31

Sure. Where would the HTML pages come from? Are they in a local directory, or somewhere on the web?
Mac GUI Vault - A source for retro Apple II and Macintosh computing.
http://macgui.com/vault/
User avatar
Dog Cow
 
Joined: 05 Sep 2008, 00:51

Re: restoring 68kmla from google and archive.com web caches

Postby Mars478 » 28 Jan 2010, 23:38

Make a second archive. Truly, all that data is invaluable. I would be willing to set up an extra server at my house in NJ which we go on only the weekends. The server would be a BW G3.
iOwn: MBP 2.8GZ, iMac G5 1.6 , iBook 1GHZ, eMac700, iMac350, iMac233, B&WG3,400, BWG3,350, iBook466Grayx2, iBook300blue, Performa6214CD, Mac512k, MacClassic2, MacIIcix2, PM8100/110, PowerBook150, Apple][e, And lots of iPhones. User:Mars478
User avatar
Mars478
 
Joined: 06 Jun 2009, 15:10

Re: restoring 68kmla from google and archive.com web caches

Postby Osgeld » 28 Jan 2010, 23:40

and if you cant get it back into form, ive got quite a few scripts that will redo all the links in a static html spider dump so you can rehost it anywhere you want (including locally, which is not really a requirement here...) , I did this quite a lot back when I was mirroring the SecondLife scripting wiki (every 3 months)

http://www.cheesefactory.us/lslwm
http://www.cheesefactory.us/slwm
Osgeld
Banned
 
Joined: 19 Jun 2009, 23:06

Re: restoring 68kmla from google and archive.com web caches

Postby Mars478 » 29 Jan 2010, 00:05

Kinda feel awesome that I resparked this movement. :)
iOwn: MBP 2.8GZ, iMac G5 1.6 , iBook 1GHZ, eMac700, iMac350, iMac233, B&WG3,400, BWG3,350, iBook466Grayx2, iBook300blue, Performa6214CD, Mac512k, MacClassic2, MacIIcix2, PM8100/110, PowerBook150, Apple][e, And lots of iPhones. User:Mars478
User avatar
Mars478
 
Joined: 06 Jun 2009, 15:10


Return to General 68kMLA News & Stuff

Who is online

Users browsing this forum: No registered users and 2 guests

cron