Thursday, December 10, 2009

Setting UserAgent for php curl session to avoid 500 Internal Server Error


We are using below function for getting webpage content using curl module of php.

function getmethod_request($ch,$ckfile,$urlValue)
{

curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile); //The name of the file containing the cookie data. The cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file.
curl_setopt($ch, CURLOPT_URL,$urlValue); // The URL to fetch. This can also be set when initializing a session with curl_init().
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile); // The name of a file to save all internal cookies to when the connection closes.
curl_setopt($ch, CURLOPT_HEADER,1); // TRUE to include the header in the output.
curl_setopt($ch,CURLOPT_AUTOREFERER,1); // TRUE to automatically set the Referer: field in requests where it follows a Location: redirect.
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.
curl_setopt($ch, CURLOPT_POST,0); //TRUE to do a regular HTTP POST. This POST is the normal application/x-www-form-urlencoded kind, most commonly used by HTML forms.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
return $data = curl_exec($ch);

}


It worked fine for many websites.

But it started giving "500 Internal Server Error" message when using it for one specific website.

I came to know that some web servers will block the requests from non-identified user-agents (browsers).

We have resolved this issue by including below lines in the function for spoofing it as FireFox 2.0.

$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);


More Articles...

1 comment:

Sun Yi said...

Thanks for this tip, I found it very useful. I had 404 error on a few occasions but I think the page was not available.

Thanks again :)

Search This Blog