11.11.1 Problem
11.11.2 Solution
If you have access to an external program that formats HTML as
ASCII, such as lynx, call it like so:
$file = escapeshellarg($file); $ascii = `lynx -dump $file`;
11.11.3 Discussion
If you can't use an external formatter, the pc_html2ascii(
) function shown in Example 11-4 handles a reasonable subset of HTML
(no tables or frames, though).
Example 11-4. pc_html2ascii( )
function pc_html2ascii($s) {
// convert links
$s = preg_replace('/<a\s+.*?href="?([^\" >]*)"?[^>]*>(.*?)<\/a>/i',
'$2 ($1)', $s);
// convert <br>, <hr>, <p>, <div> to line breaks
$s = preg_replace('@<(b|h)r[^>]*>@i',"\n",$s);
$s = preg_replace('@<p[^>]*>@i',"\n\n",$s);
$s = preg_replace('@<div[^>]*>(.*)</div>@i',"\n".'$1'."\n",$s);
// convert bold and italic
$s = preg_replace('@<b[^>]*>(.*?)</b>@i','*$1*',$s);
$s = preg_replace('@<i[^>]*>(.*?)</i>@i','/$1/',$s);
// decode named entities
$s = strtr($s,array_flip(get_html_translation_table(HTML_ENTITIES)));
// decode numbered entities
$s = preg_replace('//e','chr(\\1)',$s);
// remove any remaining tags
$s = strip_tags($s);
return $s;
}