• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

Make a mirror of the grenier-du-mac.net site! (code inside!)

tecneeq

Well-known member
So, since redundancy beats single points of failure, here is a snippet you can run in a bourne compatible shell of your preference. The goal is to have a complete mirror, including offsite downloads, of the abandonware site www.grenier-du-mac.net.

For unix like systems, the shell /bin/sh should do just fine, windows users want cygwin, uwin-base or even this very small set of GNU-tools (contains wget), compiled for win32.

Then you need wget and sed and awk, but those are nearly everywhere where you can find a /bin/sh. For the sake of simplicity i asume to be on a unix like host, linux, bsd or so.

Also you need about 3676 megs of free diskspace, check that with the command df -m. Now, without further ado, i present to you ...(drums)... the script:

Code:
#!/bin/sh

# mirror the good stuff found and linked on www.grenier-du-mac.net

mybasedir=$HOME/mirror_grenier-du-mac.net

die(){
 echo >&2 "$0: ERROR: $1"
 exit $2
}

mkdir -p "$mybasedir" || die "could not create a simple directory" 1
cd "$mybasedir"

# first pass
wget -e robots=off --user-agent=mozilla -m \
 http://www.grenier-du-mac.net/  \
 || die "download failed" 2

# second pass
wget -e robots=off --user-agent=mozilla -m -np \
 $( grep -R "http" www.grenier-du-mac.net \
     | tr '"' '\n' | grep "^http" | sort -u \
     | egrep "start10g.ovh.net/%7Eyappa/Grenier_dl/|zxt2grenier.free.fr/Grenier_du_Mac/|grenierstock.free.fr|grenierdumac.free.fr") \
 || die "either second pass download or parsing for urls failed" 3

# the end
Copy all that into a textfile, from the first line starting with # to the last line starting with #. Say your file is named mirror_grenier-du-mac.sh. You can either run it immediately, like so:

Code:
sh mirror_grenier-du-mac.sh
Or you put it somewhere in your path. If you want that, you probably know what to do. On my computer it looks like this:

Code:
chmod +x mirror_grenier-du-mac.sh
mv mirror_grenier-du-mac.sh ~/bin
Then run it by just typing the name of the script.

The script can be stopped (press CTRL and C at the same time) and completed another day, files completely downloaded before don't get downloaded again. Once you have everything downloaded and executed the script another time, just to be sure, your mirror is complete.

You could now put it in a compressed tar archive, to safe some space or because it is just convenient to have it all in a single file. Like i said above, the mirrors uses 3676 megs on disk, a single gzip compressed tar archive uses 3513 megs. I guess with bzip2 compression you could squeeze a few more megabytes out of it, but why waste the cpu time?

Here is what i did to compress it all into a neat, single file (asuming that you did not change mybasedir in the script above and did everything from your home directory):

Code:
tar cvzf ~/archive/mirror_grenier-du-mac.tgz ~/bin/mirror_grenier-du-mac.sh ~/mirror_grenier-du-mac.net
After that, i removed the uncompressed files of course:

Code:
rm -r ~/mirror_grenier-du-mac.net
You now have a complete, compressed mirror in a single file, ready to be poured onto backup media :) .

Do you have special sites that need to be mirrored and if so, what are your commands used?

EDIT #1: small error fixed, if you run BSD or OSX, make sure your mirror ends up with 3,6 GB, if not, use the fixed script

 
Last edited by a moderator:

tecneeq

Well-known member
Thanks to wackymacs, we discovered that OSX doesn't like sed and prefers tr. Well done, wacky ;) .

 

slomacuser

Well-known member
yes it stopped at 1 GB for me, how do I continue where it finished? so it wont start all over again?

FINISHED --05:22:45--

Downloaded: 1,749,535,085 bytes in 2271 files

wget: missing URL

Usage: wget [OPTION]... ...

 

Try `wget --help' for more options.

mirror_grenier-du-mac.sh: ERROR: either second pass download or parsing for urls failed

 

joshc

Well-known member
yes it stopped at 1 GB for me, how do I continue where it finished? so it wont start all over again?
FINISHED --05:22:45--

Downloaded: 1,749,535,085 bytes in 2271 files

wget: missing URL

Usage: wget [OPTION]... ...

 

Try `wget --help' for more options.

mirror_grenier-du-mac.sh: ERROR: either second pass download or parsing for urls failed
 

slomacuser did you make sure to amend your script? You can run the script as many times as you want - it will not download files already downloaded.

 

slomacuser

Well-known member
didn't modified the script yet, cause I thought it will start all over again, but will try it today when I come home ...

 

tecneeq

Well-known member
Like Wackymacs said, if you run the script again it will skip files allready downloaded. So it doesn't do any harm. Your error message is a sure sign that you have the problem i fixed. Just use the changed script in the first post and run it again.

 

macgeek417

Well-known member
Wouldn't this be much easier?

Code:
wget -m -e robots=off --user-agent=Mozilla -Dstart10g.ovh.net,zxt2grenier.free.fr,grenierstock.free.fr,grenierdumac.free.fr,grenier-du-mac.net http://grenier-du-mac.net
 
Top