1.11.1 Problem
1.11.2 Solution
$fp = fopen('fixed-width-records.txt','r') or die ("can't open file");
while ($s = fgets($fp,1024)) {
$fields[1] = substr($s,0,10); // first field: first 10 characters of the line
$fields[2] = substr($s,10,5); // second field: next 5 characters of the line
$fields[3] = substr($s,15,12); // third field: next 12 characters of the line
// a function to do something with the fields
process_fields($fields);
}
fclose($fp) or die("can't close file");
$fp = fopen('fixed-width-records.txt','r') or die ("can't open file");
while ($s = fgets($fp,1024)) {
// an associative array with keys "title", "author", and "publication_year"
$fields = unpack('A25title/A14author/A4publication_year',$s);
// a function to do something with the fields
process_fields($fields);
}
fclose($fp) or die("can't close file");
1.11.3 Discussion
Data in which each field is allotted a fixed number of
characters per line may look like this list of books, titles, and publication
dates:
$booklist=<<<END
Elmer Gantry Sinclair Lewis1927
The Scarlatti InheritanceRobert Ludlum 1971
The Parsifal Mosaic Robert Ludlum 1982
Sophie's Choice William Styron1979
END;
In each line, the title occupies the first 25 characters, the
author's name the next 14 characters, and the publication year the next 4
characters. Knowing those field widths, it's straightforward to use substr(
) to parse the fields into an array:
$books = explode("\n",$booklist);
for($i = 0, $j = count($books); $i < $j; $i++) {
$book_array[$i]['title'] = substr($books[$i],0,25);
$book_array[$i]['author'] = substr($books[$i],25,14);
$book_array[$i]['publication_year'] = substr($books[$i],39,4);
}
Exploding $booklist into an array of lines makes the
looping code the same whether it's operating over a string or a series of lines
read in from a file.
The loop can be made more flexible by specifying the field
names and widths in a separate array that can be passed to a parsing function,
as shown in the pc_fixed_width_substr( ) function
in Example 1-3.
Example 1-3. pc_fixed_width_substr( )
function pc_fixed_width_substr($fields,$data) {
$r = array();
for ($i = 0, $j = count($data); $i < $j; $i++) {
$line_pos = 0;
foreach($fields as $field_name => $field_length) {
$r[$i][$field_name] = rtrim(substr($data[$i],$line_pos,$field_length));
$line_pos += $field_length;
}
}
return $r;
}
$book_fields = array('title' => 25,
'author' => 14,
'publication_year' => 4);
$book_array = pc_fixed_width_substr($book_fields,$books);
The variable $line_pos keeps track of the start of
each field, and is advanced by the previous field's width as the code moves
through each line. Use rtrim( ) to remove
trailing whitespace from each field.
You can use unpack( ) as a
substitute for substr( ) to extract fields.
Instead of specifying the field names and widths as an associative array, create
a format string for unpack( ). A fixed-width field extractor using
unpack( ) looks like the pc_fixed_width_unpack( ) function
shown in Example 1-4.
Example 1-4. pc_fixed_width_unpack( )
function pc_fixed_width_unpack($format_string,$data) {
$r = array();
for ($i = 0, $j = count($data); $i < $j; $i++) {
$r[$i] = unpack($format_string,$data[$i]);
}
return $r;
}
$book_array = pc_fixed_width_unpack('A25title/A14author/A4publication_year',
$books);
Because the
A format to unpack( ) means "space padded string," there's no
need to rtrim( ) off the trailing spaces.
Once the fields have been parsed into $book_array by
either function, the data can be printed as an HTML table, for example:
$book_array = pc_fixed_width_unpack('A25title/A14author/A4publication_year',
$books);
print "<table>\n";
// print a header row
print '<tr><td>';
print join('</td><td>',array_keys($book_array[0]));
print "</td></tr>\n";
// print each data row
foreach ($book_array as $row) {
print '<tr><td>';
print join('</td><td>',array_values($row));
print "</td></tr>\n";
}
print '</table>\n';
Joining data on </td><td> produces a table
row that is missing its first <td> and last </td>.
We produce a complete table row by printing out <tr><td>
before the joined data and </td></tr> after the joined
data.
Both substr( ) and unpack( ) have equivalent
capabilities when the fixed-width fields are strings, but unpack( ) is
the better solution when the elements of the fields aren't just strings.