tecneeq
Well-known member
So, since redundancy beats single points of failure, here is a snippet you can run in a bourne compatible shell of your preference. The goal is to have a complete mirror, including offsite downloads, of the abandonware site www.grenier-du-mac.net.
For unix like systems, the shell /bin/sh should do just fine, windows users want cygwin, uwin-base or even this very small set of GNU-tools (contains wget), compiled for win32.
Then you need wget and sed and awk, but those are nearly everywhere where you can find a /bin/sh. For the sake of simplicity i asume to be on a unix like host, linux, bsd or so.
Also you need about 3676 megs of free diskspace, check that with the command df -m. Now, without further ado, i present to you ...(drums)... the script:
Copy all that into a textfile, from the first line starting with # to the last line starting with #. Say your file is named mirror_grenier-du-mac.sh. You can either run it immediately, like so:
Or you put it somewhere in your path. If you want that, you probably know what to do. On my computer it looks like this:
Then run it by just typing the name of the script.
The script can be stopped (press CTRL and C at the same time) and completed another day, files completely downloaded before don't get downloaded again. Once you have everything downloaded and executed the script another time, just to be sure, your mirror is complete.
You could now put it in a compressed tar archive, to safe some space or because it is just convenient to have it all in a single file. Like i said above, the mirrors uses 3676 megs on disk, a single gzip compressed tar archive uses 3513 megs. I guess with bzip2 compression you could squeeze a few more megabytes out of it, but why waste the cpu time?
Here is what i did to compress it all into a neat, single file (asuming that you did not change mybasedir in the script above and did everything from your home directory):
After that, i removed the uncompressed files of course:
You now have a complete, compressed mirror in a single file, ready to be poured onto backup media .
Do you have special sites that need to be mirrored and if so, what are your commands used?
EDIT #1: small error fixed, if you run BSD or OSX, make sure your mirror ends up with 3,6 GB, if not, use the fixed script
For unix like systems, the shell /bin/sh should do just fine, windows users want cygwin, uwin-base or even this very small set of GNU-tools (contains wget), compiled for win32.
Then you need wget and sed and awk, but those are nearly everywhere where you can find a /bin/sh. For the sake of simplicity i asume to be on a unix like host, linux, bsd or so.
Also you need about 3676 megs of free diskspace, check that with the command df -m. Now, without further ado, i present to you ...(drums)... the script:
Code:
#!/bin/sh
# mirror the good stuff found and linked on www.grenier-du-mac.net
mybasedir=$HOME/mirror_grenier-du-mac.net
die(){
echo >&2 "$0: ERROR: $1"
exit $2
}
mkdir -p "$mybasedir" || die "could not create a simple directory" 1
cd "$mybasedir"
# first pass
wget -e robots=off --user-agent=mozilla -m \
http://www.grenier-du-mac.net/ \
|| die "download failed" 2
# second pass
wget -e robots=off --user-agent=mozilla -m -np \
$( grep -R "http" www.grenier-du-mac.net \
| tr '"' '\n' | grep "^http" | sort -u \
| egrep "start10g.ovh.net/%7Eyappa/Grenier_dl/|zxt2grenier.free.fr/Grenier_du_Mac/|grenierstock.free.fr|grenierdumac.free.fr") \
|| die "either second pass download or parsing for urls failed" 3
# the end
Code:
sh mirror_grenier-du-mac.sh
Code:
chmod +x mirror_grenier-du-mac.sh
mv mirror_grenier-du-mac.sh ~/bin
The script can be stopped (press CTRL and C at the same time) and completed another day, files completely downloaded before don't get downloaded again. Once you have everything downloaded and executed the script another time, just to be sure, your mirror is complete.
You could now put it in a compressed tar archive, to safe some space or because it is just convenient to have it all in a single file. Like i said above, the mirrors uses 3676 megs on disk, a single gzip compressed tar archive uses 3513 megs. I guess with bzip2 compression you could squeeze a few more megabytes out of it, but why waste the cpu time?
Here is what i did to compress it all into a neat, single file (asuming that you did not change mybasedir in the script above and did everything from your home directory):
Code:
tar cvzf ~/archive/mirror_grenier-du-mac.tgz ~/bin/mirror_grenier-du-mac.sh ~/mirror_grenier-du-mac.net
Code:
rm -r ~/mirror_grenier-du-mac.net
Do you have special sites that need to be mirrored and if so, what are your commands used?
EDIT #1: small error fixed, if you run BSD or OSX, make sure your mirror ends up with 3,6 GB, if not, use the fixed script
Last edited by a moderator: