11.8.1 Problem
11.8.2 Solution
$patterns = array('\bdog\b/', '\bcat\b');
$replacements = array('<b style="color:black;background-color=#FFFF00">dog</b>',
'<b style='color:black;background-color=#FF9900">cat</b>');
while ($page) {
if (preg_match('{^([^<]*)?(</?[^>]+?>)?(.*)$}',$page,$matches)) {
print preg_replace($patterns,$replacements,$matches[1]);
print $matches[2];
$page = $matches[3];
}
}
11.8.3 Discussion
The regular expression used with
preg_match( ) matches as much text as possible before an HTML tag, then
an HTML tag, and then the rest of the content. The text before the HTML tag has
the highlighting applied to it, the HTML tag is printed out without any
highlighting, and the rest of the content has the same match applied to it. This
prevents any highlighting of words that occur inside HTML tags (in URLs or
alt text, for example) which would prevent the page from displaying
properly.
The following program retrieves the URL in $url and
highlights the words in the $words array. Words are not highlighted
when they are part of larger words because they are matched with the
\b Perl-compatible regular expression operator
for finding word boundaries.
$colors = array('FFFF00','FF9900','FF0000','FF00FF',
'99FF33','33FFCC','FF99FF','00CC33');
// build search and replace patterns for regex
$patterns = array();
$replacements = array();
for ($i = 0, $j = count($words); $i < $j; $i++) {
$patterns[$i] = '/\b'.preg_quote($words[$i], '/').'\b/';
$replacements[$i] = '<b style="color:black;background-color:#' .
$colors[$i % 8] .'">' . $words[$i] . '</b>';
}
// retrieve page
$fh = fopen($url,'r') or die($php_errormsg);
while (! feof($fh)) {
$s .= fread($fh,4096);
}
fclose($fh);
if ($j) {
while ($s) {
if (preg_match('{^([^<]*)?(</?[^>]+?>)?(.*)$}s',$s,$matches)) {
print preg_replace($patterns,$replacements,$matches[1]);
print $matches[2];
$s = $matches[3];
}
}
} else {
print $s;
}