This module is a generic language highlighting engine. It is driven by specifications read from file generic.conf.
Since it is a class, it can be derived to serve custom needs, such as speed optimisation on specific languages.
new ($writeDB, $pathname, $releaseid, $lang)
Method new
creates a new language object.
$writeDB
a boolean integer requesting to store language properties (huyman-readable type description) into the database
$pathname
a string containing the name of the file to parse
$releaseid
a string containing the release (version) of the file to parse
$lang
a string which is the key for the specification hash 'langmap'
in file generic.conf
This method is called by Lang
's new
method but the thirs argument is different! The returned object will be Lang
's value.
The full unfiltered content of generic.conf is stored in the object structure.
To make sure identifiers can be recognised, a default pattern (covering at least C/C++ and Perl) is copied into the language specification if none is found.
read_config ($writeDB)
Internal function (not method!) read_config
reads in language descriptions from configuration file.
$writeDB
a boolean integer requesting to store language properties (huyman-readable type description) into the database
Sets in global variable $config_contents
a reference to a hash equivalent to the configuration file. The differences are:
Keywords are uppercased if language is case-insensitive.
Keywords are stored in a hash instead of an array to speed up later retrieval (avoiding linear search and its quadratic average time)
Human-readable text for type is replaced by a record-id in the database where text is recorded.
Loading the file and transforming it is only executed once, saving the overhead of processing the config file each time.
However, The mapping between ctags tags and their human readable counterpart is stored in every database for every language. The mapping, as a table index in the DB, is kept in a new hash 'typeid'
.
indexfile ($name, $path, $fileid, $index, $config)
Method indexfile
is invoked during genxref to parse and collect the definitions in a file.
$name
a string containing the LXR file name
$path
a string containing the OS file name
When files are stored in VCSes, $path
is the name of a temporary file.
$fileid
an integer containing the internal DB id for the file/revision
$index
a reference to the index (DB) object
$config
a reference to the configuration objet
The effective job is done by ctags. This method is only a wrapper around ctags to retrieve its results and store them in the database.
referencefile ($name, $path, $fileid, $index, $config)
Method referencefile
is invoked during genxref to parse and collect the references in a file.
$name
a string containing the LXR file name
$path
a string containing the OS file name
When files are stored in VCSes, $path
is the name of a temporary file.
$fileid
an integer containing the internal DB id for the file/revision
$index
a reference to the index (DB) object
$config
a reference to the configuration objet
Using SimpleParse's nextfrag
, it focuses on "untyped" fragments (aka. code fragments) from which symbols are extracted. User symbols, if already declared, are entered in the reference data base.
parsespec ()
Method parsespec
returns the list of category specifications for this language.
The language specification is a list of hashes describing the delimiters for the different categories, such as code, string, include, comment etc.
Each category is defined by a set of 2 or 3 regexps describing the delimiters: opening, ending and optionnaly locking delimiters.
flagged ($flag)
Method flagged
returns true (1) if the designated flag is present in the language-specific hash 'flags'
.
$flag
a string containing the flag name
processinclude ($frag, $dir)
Method processinclude
is invoked to process a generic include directive.
$frag
a string containing the directive
$dir
an optional string containing a preferred directory for the include'd file
Since it is generic, the process is driven by language-specific parameters taken in hash 'include'
from the configuration file.
CAUTION! Remember that the include fragment has already been isolated by the parser through subhash 'include'
of 'spec'
. This 'include'
is a different hash, not a sub-hash.
We first make use of 'directive'
which is a regular expression allowing to split the include instruction or directive into 5 components:
directive name
spacer
left delimiter (may be void for some languages)
included object
right delimiter (may be void for some languages)
To have something useful with LXR, the included object designation has to be transformed into a file name. This is done by 'pre'
, 'global'
, 'separator'
and 'post'
optional rewrite rules. They are respectively applied once at the beginning, repetitively as much as possible (on the name or only the language-specific separator) and once at the end.
Do not be too smart with these rewrite rules. They only aim at transforming language syntax into file designation. Elaborate path processing is available with 'incprefix'
, 'ignoredirs'
and 'maps'
processed by the link builder.
When done, < A >
links to the file and all intermediate directories are build.
Note:
If no 'include'
hash is defined for this language, an internal 'directive'
matching C/C++ and Perl syntax is used.
processcode ($code)
Method processcode
is invoked to process the fragment as generic code.
$code
a string to mark
Basically, look for anything that looks like an identifier, and if it is then make it a hyperlink, unless it's a reserved word in this language.
isreserved ($frag)
Method isreserved
returns true (1) if the word is present in the language-specific 'reserved'
list.
$frag
a string containing the word to check
In the case of a case-insensitive language, comparisons are made betwwen upper case versions of the words.
language ()
Method language
is a shorthand notation for $lang->{'language'}
.
langinfo ($item)
Method langinfo
is a shorthand notation to extract sub-hashes from language description {'langmap'}{'language'}
.
$item
a string containing the name of the looked for sub-hash