title besides title

 

Thursday, November 29, 2012

PHP : Web Automation - [11.4] Fetching a URL with Cookies

11.4.1 Problem

You want to retrieve a page that requires a cookie to be sent with the request for the page.

11.4.2 Solution

Use the cURL extension and the CURLOPT_COOKIE option:
$c = curl_init('http://www.example.com/needs-cookies.php');
curl_setopt($c, CURLOPT_VERBOSE, 1);
curl_setopt($c, CURLOPT_COOKIE, 'user=ellen; activity=swimming');
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
$page = curl_exec($c);
curl_close($c);
If cURL isn't available, use the addHeader( ) method in the PEAR HTTP_Request class:
require 'HTTP/Request.php';

$r = new HTTP_Request('http://www.example.com/needs-cookies.php');
$r->addHeader('Cookie','user=ellen; activity=swimming');
$r->sendRequest();
$page = $r->getResponseBody();

11.4.3 Discussion

Cookies are sent to the server in the Cookie request header. The cURL extension has a cookie-specific option, but with HTTP_Request, you have to add the Cookie header just as with other request headers. Multiple cookie values are sent in a semicolon-delimited list. The examples in the Solution send two cookies: one named user with value ellen and one named activity with value swimming.
To request a page that sets cookies and then make subsequent requests that include those newly set cookies, use cURL's "cookie jar" feature. On the first request, set CURLOPT_COOKIEJAR to the name of a file to store the cookies in. On subsequent requests, set CURLOPT_COOKIEFILE to the same filename, and cURL reads the cookies from the file and sends them along with the request. This is especially useful for a sequence of requests in which the first request logs into a site that sets session or authentication cookies, and then the rest of the requests need to include those cookies to be valid:
$cookie_jar = tempnam('/tmp','cookie');

// log in
$c = curl_init('https://bank.example.com/login.php?user=donald&password=b1gmoney$');
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_COOKIEJAR, $cookie_jar);
$page = curl_exec($c);
curl_close($c);

// retrieve account balance
$c = curl_init('http://bank.example.com/balance.php?account=checking');
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_COOKIEFILE, $cookie_jar);
$page = curl_exec($c);
curl_close($c);

// make a deposit
$c = curl_init('http://bank.example.com/deposit.php');
curl_setopt($c, CURLOPT_POST, 1);
curl_setopt($c, CURLOPT_POSTFIELDS, 'account=checking&amount=122.44');
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_COOKIEFILE, $cookie_jar);
$page = curl_exec($c);
curl_close($c);

// remove the cookie jar
unlink($cookie_jar) or die("Can't unlink $cookie_jar");
Be careful where you store the cookie jar. It needs to be in a place your web server has write access to, but if other users can read the file, they may be able to poach the authentication credentials stored in the cookies.