This module is the elementary parser in charge of splitting the source file into homogeneous regions (i.e. fragments defined by 'spec'
s in generic.conf).
init ($fileh, $tabhint, @blksep)
init
initializes the global variables and builds the detection regexps.
$fileh
a filehandle for the source file
$tabhint
hint for the tab width (defaults to 8 if not defined)
Actual value can be given in an emacs-style comment as the first line of the file.
@blksep
an array of references to hashes defining the different categories for this languages (see generic.conf)
untabify ($line, $tab)
untabify
replaces TAB characters by spaces.
$line
string to untabify
$tab
number of spaces for a TAB (defaults to 8 if not defined)
Returns the line after replacement.
Note that this sub is presently only used by sub markupfile
when no specific parser definition could be found. No attempt is made to interpret an emacs-style tab specification. Consequently, tab width can be erroneous.
nextfrag ()
nextfrag
returns the next categorized region of the source file.
Returned value is a list: ($btype, $frag)
.
$btype
a string giving the category name
$frag
a string containing the region
Note that the "region" may span several lines.
nextfrag
implements the LXR parser. It is critical for global performance. Unfortunately, two factors put a heavy penalty on it:
1- Perl is an interpreted language,
2- parsing with regexp is not as efficient as a finite state automaton (FSA).
Speed is acceptable when displaying a file (since time here is dominated by HTML editing).
Raw speed can be seen during genxref where the full tree is parsed. It could be worth to replace the parser by a compiled deterministic FSA version.
requeuefrag ($frag)
requeuefrag
stores a string in the source input buffer for scanning by the next call to nextfrag
.
$frag
string to scan next
This sub is useful for rescanning a (tail) part of a fragment when it is discovered it contains a different category or to force parsing of a generated string.
Caveat:
When using this sub, pay special attention to the order of requests so that you do not create permutations of source sequences: it is a stack (LIFO)!