pages tagged wget http://meng6net.localhost/tag/wget/ <p><small>Copyright © 2005-2020 by <code>Meng Lu &lt;lumeng3@gmail.com&gt;</code></small></p> Meng Lu's home page ikiwiki Mon, 14 Dec 2020 03:46:01 +0000 Examples of using wget http://meng6net.localhost/computing/example/examples_of_using_wget/ http://meng6net.localhost/computing/example/examples_of_using_wget/ computing note tip wget Sun, 03 May 2020 05:47:34 +0000 2020-12-14T03:46:01Z <h2>Download web pages recursively under an URL<sup id= "fnref:1"><a href="http://meng6net.localhost/tag/wget/#fn:1" rel="footnote">1</a></sup></h2> <pre><code> wget \ --recursive \ --no-clobber \ --page-requisites \ --html-extension \ --convert-links \ --restrict-file-names=windows \ --domains example.com \ -nH --cut-dirs=some_subdir \ -e robots=off \ --random-wait \ --wait 5 \ --no-parent \ www.example.com/subdirectory/ </code></pre> <ul> <li>Substitute <code>example.com</code> and <code>www.example.com/subdirectory/</code> with relevant expressions in your problem.</li> <li><code>--recursive</code>: download the entire Web site.</li> <li><code>--domains website.org</code>: don't follow links outside website.org.</li> <li><code>--no-parent</code>: don't follow links outside the directory <code>subdirectory</code>.</li> <li><code>--page-requisites</code>: get all the elements that compose the page (images, CSS and so on).</li> <li><code>--html-extension</code>: save files with the .html extension.</li> <li><code>--convert-links</code>: convert links so that they work locally, off-line.</li> <li><code>--restrict-file-names=windows</code>: modify filenames so that they will work in Windows as well.</li> <li><code>--no-clobber</code>: don't overwrite any existing files (used in case the download is interrupted and resumed).</li> <li><code>-e robots=off</code>: force crawling regardless of robots.txt setting.</li> <li><code>-nH --cut-dirs=some_subdir</code>: cuts out hostname and subdirectory name.</li> <li><code>--random-wait</code>: randomizes the time between requests to vary between 0.5 and 1.5 times of the waiting time specified by the <code>--wait</code> option.</li> <li><code>--wait 5</code>: number of seconds to wait between requests. (See <code>--random-wait</code>.)</li> </ul> <h2>References</h2> <div class="footnotes"> <hr /> <ol> <li id="fn:1">linuxjournal.com. <em>Downloading an Entire Web Site with wget</em>. 2008. <a href= "https://www.linuxjournal.com/content/downloading-entire-web-site-wget"> https://www.linuxjournal.com/content/downloading-entire-web-site-wget</a><a href="http://meng6net.localhost/tag/wget/#fnref:1" rev="footnote">↩</a></li> </ol> </div>