Six Ways of Retrieving Webpage Content In PHP

There are so far 6 ways of Getting webpage content (full HTML) in PHP are most commonly used. The methods are

  1. using file() fuction
  2. using file_get_contents() function
  3. using fopen()->fread()->fclose() functions
  4. using curl
  5. using fsockopen() socket mode
  6. using Third party library (Such as “snoopy”)

1. file()

<?php
$url='http://blog.oscarliang.net';
// using file() function to get content
$lines_array=file($url);
// turn array into one variable
$lines_string=implode('',$lines_array);
//output, you can also save it locally on the server
echo $lines_string;
?>

2. file_get_contents()

To use file_get_contents and fopen you must ensure “allow_url_fopen” is enabled. Check php.ini file, turn allow_url_fopen = On. When allow_url_fopen is not on, fopen and file_get_contents will not work.

<?php
$url='http://blog.oscarliang.net';
//file_get_contents() reads remote webpage content
$lines_string=file_get_contents($url);
//output, you can also save it locally on the server
echo htmlspecialchars($lines_string);
?>

3. fopen()->fread()->fclose()

<?php
$url='http://blog.oscarliang.net';
//fopen opens webpage in Binary
$handle=fopen($url,"rb");
// initialize
$lines_string="";
// read content line by line
do{
	$data=fread($handle,1024);
	if(strlen($data)==0) {
		break;
	}
	$lines_string.=$data;
}while(true);
//close handle to release resources
fclose($handle);
//output, you can also save it locally on the server
echo $lines_string;
?>

4. curl

You need to have curl enabled to use it. Here is how: edit php.ini file, uncomment this line: extension=php_curl.dll, and install curl package in Linux

<?php
$url='http://blog.oscarliang.net';
$ch=curl_init();
$timeout=5;

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);

// Get URL content
$lines_string=curl_exec($ch);
// close handle to release resources
curl_close($ch);
//output, you can also save it locally on the server
echo $lines_string;
?>

5. fsockopen()函数 socket模式

<?php
$fp = fsockopen("t.qq.com", 80, $errno, $errstr, 30);
if (!$fp) {
	echo "$errstr ($errno)
n";
} else {
	$out = "GET / HTTP/1.1rn";
	$out .= "Host: t.qq.comrn";
	$out .= "Connection: Closernrn";
	fwrite($fp, $out);
	while (!feof($fp)) {
		echo fgets($fp, 128);
	}
	fclose($fp);
}
?>

6. snoopy library

This library has recently become quite popular. It’s very simple to use. It simulates a web browser from your server.

<?php
// include snoopy library
require('Snoopy.class.php');
// initialize snoopy object
$snoopy = new Snoopy;
$url = "http://t.qq.com";
// read webpage content
$snoopy->fetch($url);
// save it to $lines_string
$lines_string = $snoopy->results;
//output, you can also save it locally on the server
echo $lines_string;
?>
?>

Leave a Reply

Your email address will not be published. Required fields are marked *