The generic parser uses a very elementary algorithm to tokenise files. All is needed is to break the file into homogeneous regions, such as a string, a comment or code. Some of these regions undergo second level processing to extract identifiers which will be looked up in a dictionary.
Consequently, LXR does not need all the complication of a real compiler parser.
In the parser configuration file,
every language description is a curly brace-enclosed comma-separated list
of key/value pairs:
The most important parameter is 'spec'
which tells how to break the file into regions.
'spec'
must define the qualified regions as:
'comment'
,
'string'
or 'include'
.
'code'
Examples:
/* ... */
comment in C
//
comment in C++
(\$
stands for the end of line)
String in C: we must stay inside the string if we meet escaped characters (otherwise we may decide the end of the string and face an out-of-sync situation).
The second important parameter is 'identdef'
used inside code regions to find identifiers and keywords:
Example:
Catchall for many languages (covers identifiers and special C preprocessor keywords)
Next, you give the list of reserved keywords which will not be considered for lookup:
'case_insensitive'
is provided:
Example:
Part of C table
Finally, you give plain text explanation of ctags flags so that the cross-reference listings can label the identifiers with human readable descriptions. Refer to ctags man page for the complete list applicable to a given language.
Example:
Part of C table
In case the language may "import" sub-files (don't worry about C/C++, its rules are builtin), you give rules to LXR so that it transforms the language-form file description into OS-form file reference to be able to plug a clickable link to said sub-file:
'pre'
and 'post'
were respectively named
'first'
and 'last'
.
'separator'
appeared in 1.2.
'directive'
defines a reg-exp to split the statement
into 5 components, namely:
Example for Perl use
or require
:
'separator'
optionally defines the language-specific
path separator in filenames.
It is replaced by the OS separator before trying to access the file.
'pre'
, 'global'
and 'post'
are optional substitution rules
(target is a pattern and replacement is
substituted in case of a match).
'pre'
is applied only once at the beginning.
'global'
is then repeatedly applied until there is
no more match.
'separator'
is replaced by the OS separator.
'post'
is applied only once after the other rules.
Example for Perl use
or require
:
It is more efficient than the equivalent:
'include'
.