Inside Simple File Parser

SimpleParse module

SimpleParse module

This module is the elementary parser in charge of splitting the source file into homogeneous regions (i.e. fragments defined by 'spec's in generic.conf).

`init ($fileh, $tabhint, @blksep)`

init initializes the global variables and builds the detection regexps.

$fileh

a filehandle for the source file
$tabhint

hint for the tab width (defaults to 8 if not defined)

Actual value can be given in an emacs-style comment as the first line of the file.
@blksep

an array of references to hashes defining the different categories for this languages (see generic.conf)

`untabify ($line, $tab)`

untabify replaces TAB characters by spaces.

$line

string to untabify
$tab

number of spaces for a TAB (defaults to 8 if not defined)

Returns the line after replacement.

Note that this sub is presently only used by sub markupfile when no specific parser definition could be found. No attempt is made to interpret an emacs-style tab specification. Consequently, tab width can be erroneous.

`nextfrag ()`

nextfrag returns the next categorized region of the source file.

Returned value is a list: ($btype, $frag).

$btype

a string giving the category name
$frag

a string containing the region

Note that the "region" may span several lines.

nextfrag implements the LXR parser. It is critical for global performance. Unfortunately, two factors put a heavy penalty on it:

1- Perl is an interpreted language,

2- parsing with regexp is not as efficient as a finite state automaton (FSA).

Speed is acceptable when displaying a file (since time here is dominated by HTML editing).
Raw speed can be seen during genxref where the full tree is parsed. It could be worth to replace the parser by a compiled deterministic FSA version.

`requeuefrag ($frag)`

requeuefrag stores a string in the source input buffer for scanning by the next call to nextfrag.

$frag

string to scan next

This sub is useful for rescanning a (tail) part of a fragment when it is discovered it contains a different category or to force parsing of a generated string.

Caveat:

When using this sub, pay special attention to the order of requests so that you do not create permutations of source sequences: it is a stack (LIFO)!