11.9.1 Problem
11.9.2 Solution
Example 11-2. pc_link_extractor( )
function pc_link_extractor($s) {
$a = array();
if (preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i',
$s,$matches,PREG_SET_ORDER)) {
foreach($matches as $match) {
array_push($a,array($match[1],$match[2]));
}
}
return $a;
}
For example:
$links = pc_link_extractor($page);
11.9.3 Discussion
The pc_link_extractor( )
function returns an array. Each element of that array is itself a two-element
array. The first element is the target of the link, and the second element is
the text that is linked. For example:
$links=<<<END Click <a href="http://www.oreilly.com">here</a> to visit a computer book publisher. Click <a href="http://www.sklar.com">over here</a> to visit a computer book author. END; $a = pc_link_extractor($links); print_r($a); Array ( [0] => Array ( [0] => http://www.oreilly.com [1] => here ) [1] => Array ( [0] => http://www.sklar.com [1] => over here ) )
The regular expression in pc_link_extractor( ) won't
work on all links, such as those that are constructed with JavaScript or some
hexadecimal escapes, but it should function on the majority of reasonably
well-formed HTML.