SimpleParse module

This module is the elementary parser in charge of splitting the source file into homogeneous regions (i.e. fragments defined by 'spec's in generic.conf).

init ($fileh, $tabhint, @blksep)

init initializes the global variables and builds the detection regexps.

  1. $fileh

    a filehandle for the source file

  2. $tabhint

    hint for the tab width (defaults to 8 if not defined)

    Actual value can be given in an emacs-style comment as the first line of the file.

  3. @blksep

    an array of references to hashes defining the different categories for this languages (see generic.conf)

untabify ($line, $tab)

untabify replaces TAB characters by spaces.

  1. $line

    string to untabify

  2. $tab

    number of spaces for a TAB (defaults to 8 if not defined)

Returns the line after replacement.

Note that this sub is presently only used by sub markupfile when no specific parser definition could be found. No attempt is made to interpret an emacs-style tab specification. Consequently, tab width can be erroneous.

nextfrag ()

nextfrag returns the next categorized region of the source file.

Returned value is a list: ($btype, $frag).

  1. $btype

    a string giving the category name

  2. $frag

    a string containing the region

    Note that the "region" may span several lines.

nextfrag implements the LXR parser. It is critical for global performance. Unfortunately, two factors put a heavy penalty on it:

1- Perl is an interpreted language,

2- parsing with regexp is not as efficient as a finite state automaton (FSA).

requeuefrag ($frag)

requeuefrag stores a string in the source input buffer for scanning by the next call to nextfrag.

  1. $frag

    string to scan next

This sub is useful for rescanning a (tail) part of a fragment when it is discovered it contains a different category or to force parsing of a generated string.

Caveat: