Download web pages recursively under an URL1
wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains example.com \
-nH --cut-dirs=some_subdir \
-e robots=off \
--random-wait \
--wait 5 \
--no-parent \
www.example.com/subdirectory/
- Substitute
example.com and
www.example.com/subdirectory/ with relevant
expressions in your problem.
--recursive: download the entire Web site.
--domains website.org: don't follow links outside
website.org.
--no-parent: don't follow links outside the
directory subdirectory.
--page-requisites: get all the elements that
compose the page (images, CSS and so on).
--html-extension: save files with the .html
extension.
--convert-links: convert links so that they work
locally, off-line.
--restrict-file-names=windows: modify filenames so
that they will work in Windows as well.
--no-clobber: don't overwrite any existing files
(used in case the download is interrupted and resumed).
-e robots=off: force crawling regardless of
robots.txt setting.
-nH --cut-dirs=some_subdir: cuts out hostname and
subdirectory name.
--random-wait: randomizes the time between
requests to vary between 0.5 and 1.5 times of the waiting time
specified by the --wait option.
--wait 5: number of seconds to wait between
requests. (See --random-wait.)
References