We needed to get the description of a site automatically from its url after parsing the HTML markup, for one of the Drupal sites we were working on. This is a task that can be easily accomplished by using the PHP curl library. Read on to know how to use a php script and curl to download the contents of a website.
Curl is a php library where we access the contents of a site (it can connect and communicate to different types of servers).
My objective was to get the HTML content of a site. Let’s see how it can be obtained.
function curl_download($Url){
// is cURL installed yet?
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
// OK cool - then let's create a new cURL resource handle
$ch = curl_init();
// Now set some options (most are optional)
// Set URL to download
curl_setopt($ch, CURLOPT_URL, $Url);
// User agent
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
// Include header in result? (0 = yes, 1 = no)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Should cURL return or print out the data? (true = retu rn, false = print)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Timeout in seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
// Download the given URL, and return output
$output = curl_exec($ch);
// Close the cURL resource, and free system resources
curl_close($ch);
return $output;
}
Let’s take a look at the above function.$ch is a new cURL resource handle. curl_init() is used to initialize the session and returns a cURL handle. We can also set some options by using curl_setopt(). CURLOPT_URL specifies the URL which you want to process.curl_exec() is used to execute a curl function. The url which we want to process is passed to the above function, which results in the output.
Try this out to download the contents of a website