Thursday, December 10, 2009

Parsing webpage using php DOM.


PHP is having DOM functionality for parsing webpage by considering the HTML content of the webpage as XML data.

If you assign the $data with the HTML content of the webpage, below code will load the HTML content in DOM object.

$dom = new DOMDocument();
@$dom->loadHTML($data);


The "@" symbol before the second line of the code is required to suppress any warning message related to XML related errors in the HTML content.

xpath can be used to get the desired data.

For example, below code will extract all the "tr" tags in a html file.

$xpath = new DOMXPath($dom);
$tablerows = $xpath->evaluate("/html/body//tr");


Below code can be used if you want to get data from first "TD" within each row of the Table.

$content="";
for ($i =1; $i < $tablerows->length; $i++)
{

$singlerow = $tablerows->item($i);

$tablecells= $singlerow->getElementsByTagName("td");

$content.=$tablecells->item(0)->nodeValue."
";
}



More Articles...

No comments:

Search This Blog