Step-by-step guide
- Install Wget
- windows: Wget comes with Cygwin installer[https://cygwin.com/install.html].
- Mac:First install brew:
ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
And then install wget with brew and also enable openressl for TLS support
brew install wget --with-libressl
Run the following code to crawl www.example.com and save it as flat files to an arbitrary directory of your choosing (noted by /path/to/destination/directory):
wget -P /path/to/destination/directory/ -mpck --user-agent="" -e robots=off --wait 1 -E https://www.example.com/
See this code explained on explainshell
- Double check your local copies and replace the absolute links with relative links so that your css, js and image files will work.
I used three commands to get all the givingday sites (live mode and leadership mode need to be downloaded separately).
$ wget -P /GivingDay -mpck --user-agent="" -e robots=off --wait 1 -E
https://givingday.cornell.edu/?m=live
Somehow wget adds ".html" at the end of ".css", ".js", 'png', and ".ico" files, so they became ".css.html", ".js.html", "png.html". I need to do a replace in all html pages to make the css, js and image file names right.
And I need to replace all absolute links that wget fail to replace.
Related articles
https://swsblog.stanford.edu/blog/creating-static-copy-website
https://raywoodcockslatest.wordpress.com/2014/02/14/configuring-wget/
https://www.quora.com/How-do-you-export-a-WordPress-site-to-a-static-HTML