梅庵雑記帳 (Baian's Notebook): How to Download Files From The Web Using Wget [Linux CLI Tips]

This issue has been quoted from the weblog, [Linux CLI Tips] Web Upd8 "How to Download Files From The Web Using Wget " :

In this post: a little about the WGET command:

Reference: man WGET(1)
NAME
Wget - The non-interactive network downloader.
SYNOPSIS
wget [option]... [URL]...
DESCRIPTION GNU Wget is a free utility for non-interactive download of files from the Web.
It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.
Wget is non-interactive, meaning that it can work in the background, while the user is not logged on.
This allows you to start a retrieval and disconnect from the system, letting Wget finish the work.
By contrast, most of the Web browsers require constant user's presence, which can be a great hindrance when transferring a lot of data.
Wget can follow links in HTML and XHTML pages and create local versions of remote web sites, fully recreating th directory structure of the original site.
This is sometimes referred to as ``recursive downloading.'' While doing that, Wget respects the Robot Exclusion Standard (/robots.txt).Wget can be instructed to convert the links in downloaded HTML files to the local files for offline viewing.
Wget has been designed for robustness over slow or unstable network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.

Download a zip archive:

wget http://website.com/archive.zip

and archive.zip will be downloaded. But we can download files using a lot of parameters. Read on!

wget -r http://website.com

This download all files recursively: images, html files, etc. But this could get us banned by the server for sending too many download requests so to avoid this:

wget --random-wait --limit-rate=20k -r http://website.com

--random-wait means to download a file and then wait for a random period of time, then download the next file and so on.

--limit-rate=20k indicates that you want to download at a maximum speed of 20k so you don't get banned.

Or you could also do:

wget --wait=20 --limit-rate=20K -r -p -U Mozilla http://website.com

--wait=20 to wait 20 seconds between each file download, but I think it's better to download with --random-wait

-p indicates that the files should be displayed as HTML, as if you were actually looking at the page

-U Mozilla will make the website believe you are using a Mozilla browser.

And here is how to download all images, videos or whatever you want, from a website:

     wget -r -A=.jpg,.png http://website.com

       With this command, you download all jpg and png files from website.com.
       If you want to download all mp3s, then you would use -A=.mp3
       You can also use a GUI for wget if you want. It's called Gwget and should
       be in your distribution repositories. For Ubuntu, do:

 sudo apt-get install gwget
Credits: paraisolinux

© webupd8.blogspot.com 2009. | What's New on the World Wide Web
"

梅庵雑記帳 (Baian's Notebook)

2009-08-20

How to Download Files From The Web Using Wget [Linux CLI Tips]

0 Comments::

Blog Archive

About Me

Links

Downloads

Related Links to My Blogger