Inside Database Access

Index module

Index module

This module defines the abstract access methods to the database. If needed, the methods are overridden in the specific modules.

`new ($dbname)`

new is Index object constructor. It dispatches to the specific constructor based on its argument.

$dbname

a string containing the condiguration parameter 'dbname' describing the engine and the characteristics of the DB

Note:

There used to be a second @args argument which passed file open-attributes (such as O_RDWR or O_CREAT) when the DB was made of a set of files. This is no longer used with DB engines.

The specific constructor is responsible for creating hash elements in $self containing "cooked" queries (meaning they have been processed by prepare DBD method.

They are mentioned by the Requires paragraphs in the following method descriptions.

`read_open ()`

read_open "prepares" the transactions which read from the database.

The separation between read and write transactions is two-fold.

First, it ensures that faulty code will not corrupt the database when the write transactions have not been enabled. Second, it improves initialisation speed and decreases memory footprint when only browsing the tree.

`write_open ()`

write_open "prepares" the transactions which write into the database. They are only used by the indexing utility.

The separation between read and write transactions is two-fold.

`write_close ()`

write_close removes the write-enable transactions.

`uniquecountersinit ($prefix)`

uniquecountersinit initialises the unique counters for file, symbol and type ids.

This is a new extension method for derived object usage.

$prefix

a string containing the database table prefix

Several database engines have better performance using cached counters for fields with unique attributes unstead of the built-in features. It comes from the fact that the used (incremented) value is not written back immediately to disk (fewer commits).

This trick is valid because we write to the DB only at genxref time and DB loading is single thread.

CAUTION!

Don't forget to write the final values to the DB before disconnecting. See uniquecounterssave.

`fileid ($filename, $revision)`

`fileidifexists ($filename, $revision)`

fileid returns a unique id for a file with a given revision, creating it if it does not exist.

fileidifexists is similar, but returns undef if the given revision is unknown, which can happen if the revision was created after the latest genxref indexation.

$filename

a string containing the path relative to 'sourceroot'
$revision

the revision for the file

CAUTION: this is not a release id! It is computed by method filerev in the Files classes.

The result is used as an index between the different DB tables to refer to the file.

Requires:

files_select
files_insert
status_insert (fileid only)

`getallfilesinit ($releaseid)`

getallfilesinit prepares things for nextfile.

$releaseid

the release (or version) for which all recorded files should be returned

The subroutine executes the allfiles_select transaction. Results are retrieved one by one through nextfile.

Requires:

allfiles_select

`nextfile ()`

nextfile is an iterator running over all files making up a version of the source tree, as known from the database.

A file description is returned for each call until it returns undef, at which time it must no longer be called.

Requires:

Previous initialisation by getallfilesinit

`setfilerelease ($fileid, $releaseid)`

setfilerelease marks the file referred to by $fileid as part of $releaseid.

$fileid

an integer representing a file in the DB
$releaseid

the release (or version) containing the file

Requires:

releases_select
releases_insert

The final result is as many records in the releases tables as versions of this file. All these records point to the same item in the files table.

The releaseid is any tag under which the file in this state is known by the VCS. The revision, stored in the files table, is a canonical identification of the file state. The file state will be parsed and cross-referenced only once, thus reducing genxref processing time, but the result may still be referenced by any tag.

`removerelease ($fid, $releaseid)`

removerelease deletes one release from the set associated to a base revision.

$fid

the unique id for a base revision file
$releaseid

the release (or version) containing the file

Requires:

delete_one_release

`fileindexed ($fileid)`

fileindexed returns true is the file referred to by $fileid has already been indexed; otherwise, it returns false.

$fileid

an integer representing a file in the DB

Requires:

status_select

`setfileindexed ($fileid)`

setfileindexed marks the file referred to by $fileid as being indexed.

Since indexing (i.e. symbol definition collecting) is usually done outside LXR, indexing time is not updated.

$fileid

an integer representing a file in the DB

Requires:

status_select
status_insert
status_update

`filereferenced ($fileid)`

filereferenced returns true is the file referred to by $fileid has already been parsed for references; otherwise, it returns false.

$fileid

an integer representing a file in the DB

Requires:

status_select

`setfilereferenced ($fileid)`

setfilereferenced marks the file referred to by $fileid as having been parsed for references.

Indexing time is updated for user information.

$fileid

an integer representing a file in the DB

Note:

A file must always be indexed before being parsed for references.

Requires:

status_select
status_insert
status_update
status_update_timestamp

`filetimestamp ($fileid)`

filetimestamp retrieves the time when the file was parsed for references.

$filename

a string containing the path relative to 'sourceroot'
$revision

the revision for the file

CAUTION: this is not a release id! It is computed by method filerev in the Files classes.

Requires:

status_timestamp

`symdeclarations ($symname, $releaseid)`

symdeclarations returns an array containing the set of declarations for the symbol in this release.

$symname

the symbol name
$releaseid

the release (or version) containing the file

Requires:

definitions_select

`setsymdeclaration ($symname, $fileid, $line, $langid, $type, $relsym)`

setsymdeclaration records a declaration in the DB.

$symname

the symbol name
$fileid

the unique id which identifies a file AND a release
$line

the line number of the declaration
$langid

an integer key for the language
$type

the type of the symbol
$relsym

an optional relation to some other symbol

Requires:

definitions_insert

`symreferences ($symname, $releaseid)`

symreferences returns an array containing the set of references to the symbol in this release.

$symname

the symbol name
$releaseid

the release (or version) containing the file

Requires:

usages_select

`setsymreference ($symname, $fileid, $line)`

setsymreference records a reference in the database if the symbol is already present (as a declaration).

$symname

the symbol name
$fileid

the unique id which identifies a file AND a release
$line

the line number of the declaration

Requires:

symbols_byname
usages_insert

setsymreference includes since release 1.0 part of issymbol so that this latter function is no longer needed when referencing files and MUST NOT be used in referencefile functions

`issymbol ($symname, $releaseid)`

issymbol returns true (1) for an existing symbol in a given release according to the DB, false (0) otherwise.

$symname

the symbol name
$releaseid

the release (or version) containing the file

Requires:

symbols_byname

This functions is used during browsing to decide whether the symbol should be highlighted or not.

Since release 1.0, this function is no longer used during the usage collecting pass. It can now have its own independent cache strategy, but it MUST NOT be called outside the browsing pass.

`symid ($symname)`

symid returns a unique id for a symbol.

If symbol is unknown, insert it into the DB with a zero reference count. The reference count is adjusted by the methods which add definition or usage. Decrementing the reference count is only done when purging the database.

$symname

the symbol name

Requires:

symbols_byname
symbols_insert

`symname ($symid)`

symname returns the symbol name from a symbol id.

$symid

the unique id for a symbol

Requires:

symbols_byid

`decid ($writeflag, $lang, $string)`

decid retrieves a unique id for a declaration type in a given language. If this declaration is not yet in the DB, record it if the write flag is set.

$lang

the unique id for the language
$string

the text for the declaration (from {'typemap'}{letter} in a generic.conf language description)

Requires:

langtypes_select
langtypes_insert

These records are in fact the text for the language types.

The text retrieval function is not implemented because it is implictly done in the symdeclarations query.

CAVEAT!

This implementation is valid for DB engines with auto-incrementing fields. It must be overridden when the auto-incrementation feature is missing (e.g. PostgreSQL and SQLite).

`commit ()`

Commit the last set of operations and start a new transaction.

If transactions are not supported, it's OK for this to be a no-op.

`forcecommit ()`

Commit now the database, even if auto commit mode is in effect.

This method should not be overridden in specific drivers.

`emptycache ()`

emptycache empties the internal symbol cache.

This function should be called before parsing each new file. If this is not done then too much memory will be used and things will become very slow.

Note:

With the implementation of flushcache, this function is no longer necessary since the cache is also emptied in that subroutine.

`flushcache ($full)`

flushcache flushes the internal symbol cache.

$full

optional argument to force 0-count write back

(When creating the database, reference counts are incremented. Consequently, if the final count is still zero, the symbol has not been referenced and there is no need to overwrite the record. On the contrary, when purging the database, reference counts may decrement to zero and it is then mandatory to update the record so that it can later be purged or correctly updated.)

This function should be called at the end of file processing. It writes the cached symbol reference count into the appropriate symbol records of the DB.

To minimize I/O, reference counts are negated when entered into the cache. The counts are turned back positive when they need to be incremented. Thus strictly positive values show which symbols have been referenced. Only these are flushed to the DB.

The cache is then emptied

Requires:

symbols_setref

`purgefile ($fid, $releaseid)`

purgefile deletes data related to an obsoleted file in the DB.

Data associated to the designated file are erased from the tables.

$fid

the unique id for a base revision file
$releaseid

the release (or version) containing the file

Requires:

related_symbols_select
delete_file_definitions
delete_file_usages

"Relation" symbol (from definitions) reference count must be decremented first. After that, order of definitions/usages deletion is irrelevant.

Symbols are not deleted when their reference count decrements to zero because the same file (in a more recent version) is supposed to be indexed soon: a majority of the symbols will be reentered again in the database.

Release erasure is done in another sub since this erasure can occur also when no definition/usage deletion is necessary. The relevant code is thus written only once.

`purge ($releaseid)`

purge selectively deletes data in the DB.

Data associated to a release are erased from the tables.

Order of erasure is critical to comply with foreign key constraints between the different tables and to guarantee correctness of resulting database structure.

Once we know which base version files will be deleted, definitions and usages in these files are erased, which decrements symbol count. The symbols with zero reference are deleted then.

After this step, no definition or usage are left pointing to the candidate files. Releases are deleted, decrementing the references in status. Status with zero reference are then deleted (files cannot be deleted first because there is a "foreign key contraint" on files to status). Files are implicitly deleted by a trigger from status deletion.

$releaseid

the target release (or version)

Requires:

delete_definitions
delete_usages
delete_symbols
delete_releases
delete_unused_status which should also delete files table

Note:

DBD commit() is explicitly called to bypass possible disabling caused by private overriding method commit.

Todo:

Manage the relid relationship in definitions

`purgeall ()`

purgeall deletes all data in the DB.

This is a brutal way of erasing everything, e.g. for --reindexall --allversions. It is much more efficient than a sequence of purge on every version.

Requires:

purge_all

`uniquecountersreset ($force)`

uniquecountersreset restarts the counters from 0.

$force

an integer used to force the $xxxini variables

If different from 0, this forces uniquecounterssave to write the reset values to the DB if immediately called after this method.

It is better to call the method a second time with argument 0 to avoid any unforeseen side-effects, though there should be none.

`uniquecounterssave ()`

uniquecounterssave stores in the DB the current values of the file, symbol and type counters for later sessions.

`dropuniversalqueries ()`

dropuniversalqueries deactivates all "universal" query statement to prevent annoying "Disconnect invalidates xx active statement handles ..." messages from disturbing the end user. Derived instances are responsible for killing their own queries.

Most are probably overkill since execure or fetchrow_array may already have disactivated the statement.

Must be called before final_cleanup before disconnecting.

`saveperformance ($releaseid, @wtimes)`

saveperformance writes genxref's milestone times to the DB.

$releaseid

the release (or version) for which performance data should be saved
$reindex

full reindex flag
$step

a single-character string identifying the step
$starttime

the starting time of the step (in seconds)
$endtime

the completion time of the step (in seconds)

Note:

This is for informational purpose only. It allows to analyse later genxref steps performance.

Requires:

times_select
times_insert
times_update

`getperformance ($releaseid)`

getperformance retrieves genxref's milestone times from the DB.

$releaseid

the release (or version) for which performance data should be returned

Requires:

times_select

`final_cleanup ()`

final_cleanup allows to execute last-minute actions on the database and disconnects.

Must be called before Index object disappears.

`post_processing ()`

post_processing executes maintenance actions on the database at end of genxref processing.

Must be the last action called before Index object disappears.