en/LXR Bugs & Limitations

Known bugs

Vulnerability JVN#72589538 - CVE 2018-0545

Japan Computer Emergency Response Team (JPCERT) reported in March 2018 a vulnerability discovered during Summer 2017 by Touma Hatano (波多野冬馬氏). This vulnerability allows to execute arbitrary commands with a specially crafted string submitted through General Search form.

This vulnerability affects all LXR versions with enabled free-text search since release 1.0.0.

Fix in 2.3.0 left a possibility to exploit the vulnerability. Fix in release 2.3.1 is now correct.

Users with LXR servers visible from the Internet are advised to update to protect themselves against this vulnerability.

Users who can't/don't want to upgrade should disable the General Search feature. See this tip.

Utilities-related bugs

UTF-8 management
Newer Perl interpreters are becoming more and more picky about UTF-8 sanity (valid multibyte sequences). Sometimes, LXR needs to examine file contents prior to deciding whether it will display it or not. If this file does not contain text (graphics or otherwise binary), there is a high probability that runs of bytes does not form a valid UTF-8 sequence. In this case, Perl issues a fatal error.
Release 2.1.0 tries to address this issue by filtering the file with File::MMagic when "extension screening" has failed. Note that extension screening is given priority and will give an erroneous result if file extension does not correspond to usual content. This priority was adopted because File::MMagic involves I/O, which would otherwise impact performance on big trees such as the Linux kernel.
Apache web server
As of 2018, many Linux distributions consider the event MPM module stable enough that they enabled its use by default. LXR knew only of prefork and worker. event broke LXR initialisation for Apache, causing internal error 500 or at best display of LXR scripts as text.

Additionally, event seems not to be compatible with mod_perl requiring to fall back to standard CGI processing.
Fixed in release 2.3.3
Debian based distributions, including Ubuntu Ubuntu comes with an original default configuration which precludes LXR smooth operation "from the box". This is not a bug per se but requires specific settings. They are described in this page.
- Current working directory
  Debian seems to not set properly the working directory before launching a CGI script. As a consequence, relative file designations which rely on the working directory to be the LXR root directory will fail and a "file not found" error is displayed.
  The configuration wizard has been fixed in release 2.0.3 to always use OS-absolute paths. NOTE: This bug may be Apache-specific, since it has only been reported against Apache.
- Apache web server
  A configuration file is dependent upon mod_version Apache module but Ubuntu Apache (reported at least for Ubuntu 12.04 LTS) does not load the module by default. To enable it, run the following command: $ sudo a2enmod version and restart the server.
- Templates not found (weird screen)
  When LXR scripts are launched, the current working directory is not set to the LXR root directory. OS-relative file paths in lxr.conf cannot be correctly resolved.
  
  To fix the problem, edit custom.d/lxr.conf, "HTML subsection" and add the OS-absolute path of the LXR root directory in front of all HTML template designations. When done, copy the modified file to its final destination as usual.
  DO NOT CHANGE parameters 'stylesheet' and 'alternate_stylesheet' which are not file paths but HTML references. Fixed in release 2.0.0. This bug may also be Apache specific. Starting with 2.0.3, the fix has been merged with Debian handling.
MySQL
- Performance issue
  A misconception of TRUNCATE TABLE in MySQL causes a very slow global database purge (with option --reindexall) which becomes noticeable on big trees like a Linux kernel.
  Worked around in release 2.2.0.
- Default storage engine
  As of MySQL 5.5.5, default storage engine changed from MyISAM to InnoDB. Despite an improvement in transaction reliability, performance has drastically degraded: indexing by genxref is now twice as long.
  Release 1.0 explicitly requests the MyISAM storage engine.
PostgreSQL
- Connection method
  PostgreSQL offers several methods to connect to a database. The default one uses Unix-domain sockets. Under some circumstances, the Perl DB library may fail to access the LXR database. The work-around is to request explicitly the TCP access method in the 'dbname' configuration parameter by adding:
  ;host=localhost Release 1.0 is changed to use the TCP access method. If Unix domain sockets work for you, remove the host= sub-parameter.
Supported languages
Many language descriptions are faulty or, at best, incomplete. Only C, C++ and Perl can be considered reliable.
The present maintainer does not master all languages. Do not hesitate to submit bugs.
Identifiers not highlighted
This is not really an LXR bug. ctags uses elementary parsers for secondary languages (read: not C nor C++). These parsers do not implement complete language grammar.
Git support
- ~~Not working at all. The CPAN library module is broken.~~ Fixed in release 1.0
- In some distributions, may error out when trying to display a file.
- Faulty 'range' function in lxr.conf loses release tags. Both fixed in release 2.0.1

Release bugs

Releases 2.3.3 and 2.3.4 - glimpse indexing in genxref
Fix in 2.3.3 does not handle all possible cases resulting in indexing failures under some circumstances.
Fixed in release 2.3.5
Releases prior to 2.3.4 - endless loop when trying to hyperlink malformed path in include-like statements
Though OSes tolerate paths with multiple consecutive fragment separators, LXR expects only one separator and non-void directory name between separators.
Fixed in release 2.3.4, becomes tolerant to this bad notation
Releases 2.2.0 to 2.3.2 - glimpse indexing in genxref
Incorrect management of offline indexing (to allow uninterrupted service on huge trees) results in duplicate hierarchical inclusion of glimpse databases in the indexing directory, making it grow when source tree is reindexed incrementally.
Fixed in release 2.3.3
To get rid of the superfluous DBs, run genxref with option --reindexall.
Releases 1.2.0 to 2.3.1 - include statement management
Under certain circumstances, included file path (in statements like #include in C or import and from in Python) was not correctly edited for hyperlinks resulting in an infinite loop.
Fixed in release 2.3.2; upgrade highly recommended to Python users where error is more likely to occur
Release 2.2.0
- configure-lxr.pl configration wizard
  An unfortunate confusion between variables prevents the script from being compiled.
  Fixed in release 2.2.1; upgrade required
- Apache
  Recent Ubuntu distros (problem detected in 16.10 Server) use Apache MPM Event module, whereas the LXR configuration file .htaccess tests only for Prefork and Worker. This module choice is probably not Ubuntu-specific and may also have been adopted in other distributions, at least the Debian-based ones.
  
  Since only Prefork needs specific initialisation and considering the lack of detailed documentation on Event, it has been decided to initialise all MPM modules the same but for Prefork.
  Fixed in release 2.2.1, but do not hesitate to report problems if fix is not satisfactory
Release 1.1.0
- Java parser incompletely handles import statement
  If your Java code contains import static statements, the parser fails on static keyword and enters an endless loop which freezes the screen on the previous line.
  Fixed in release 1.2.0.
- HTML parser may loop endlessly Bug also present in release 1.0.0
  When trying to hyperlink <A> or <IMG> elements, the generic-based HTML parser may fail and not recover, leading to an endless loop. It is also not protected against "external" URI (such as http://…) where the double slash is taken for an empty file path.
  Fixed in release 1.2.0 through implementation of a dedicated parser.
- Depending on Perl interpreter, identifier search may error out
  Two lines (419 and 485) in ident use relaxed Perl syntax instead of "bullet-proof" code. Comment for bug #233 contains a patch.
  Fixed in release 1.2.0.

Limitations

Error management
Prior to release 2.0.0, error messages are only sent to the browser error log. This usually requires administrator privileges to retrieve the information.
templates/ directory
Starting with release 0.11, this directory is set to "read-only" mode to discourage people from making changes. It thus becomes a reliable reference for stored files. Any LXR customisation must be done in the custom.d/ directory (lxrconf.d/ prior to release 1.0).
Consequence: any file extracted from templates/ is tagged "read-only" and its permissions must be reset before making changes. $ cp templates/some_file custom.d/ $ chmod u+w custom.d/some_file
Version control systems (CVS, Git, …)
Free-text search is unavailable due to the specific format of the VCS database (read: they do not use plain files).
Directories in CVS
CVS does not manage version for directories. If you change version for something other than HEAD, you get the infamous "file does not exist" error message.
Mercurial support
- Files may erroneously appear empty in some revisions
- Directory listing extremely slow
  Approximately 0.5 second per line (on a 3.4 GHz processor!); but do not blame Mercurial, something is wrong in LXR code interfacing with the system.
  Please, help to improve performance.
Language parsers
They are not compiler-grade (and very far from that). They are limited to lexically extracting runs of characters looking like identifiers (independently of their semantics).
- Generic parser (presently used for all languages)
  Ignores an eventual sigil when assembling identifiers. This results in confusion of symbols with different roles into a single highlighting.
  Sigils are used in Perl to differentiate functions, scalars, arrays and hashes. Similarly, in PHP, variables are prefixed with $ while functions have no prefix.
  Ignores identifier classification since it does not manage semantics. This results too in confusion of symbols with different roles into a single highlighting.
  In HTML, tags and attributes with similar symbol have the same highlighting.
- HTML
  URI references are hyperlinked only if they point to another document in the source tree.
  
  The parser gets easily confused if the content part (outside HTML tags) contains quotes or double quotes. It causes an out-of-sync situation.
- Perl
  Due to ctags limitation, only subs and no variables are indexed. Variables are accidentally highlighted if their name, without sigil, is the same as a function.
  
  Occurrences of characters such as #, ' or " in patterns are likely to cause out-of-sync situations.
- PHP
  include or require will hyperlink to the target file only if the name is a string relative to source-tree root. Also, the name cannot contain quotes (') or double quotes (").
- Python
  Multiple includes from a single import or from instruction are not handled.
Difference markup
- Link interface
  You can compare two files differing in only one variable.
- Buttons-and-menus interface
  You can compare two files differing in any number of variables provided you set all required variables before clicking on the "Change" button.
- Changing variable value while displaying differences
  The new differences will be computed between the most recently selected version and the newly requested, not between the base version (the one active when you clicked on the "diff markup" link) and the newly requested version.