NAME

Litigation::Database - Extensible object-oriented database with a web interface

VERSION

This document describes Litigation::Database version 0.01

SYNOPSIS

    use Litigation::Database;

    my $db = Litigation::Database::Connection->login_new
      ('01', 'jsmith', 'mysecretpassword');
    my $list = Litigation::Database::Object::List->new();
    $list->set_t("A list of fruit");
    $list->save({db => $db});
    my $apple = Litigation::Database::Object->new();
    $apple->set_t("Apple");
    my $orange = Litigation::Database::Object->new();
    $orange->set_t("Orange");
    $list->add_Member_Of(
                          {obj => $apple},
                          {obj => $orange},
                        );
    $list->save_new_relns();
    my $top = $list->get_top_object();
    $top->add_Item_Of({obj => $list});
    $top->save_new_relns();

DESCRIPTION

Litigation::Database is an extensible object-oriented database system with a web-based front end, running on Javascript, Apache, and mod_perl. Currently, it is distributed only in the form of a Debian package, and only works with PostgreSQL for the back end databases, but it could be adapted for other operating system and SQL platforms.

Although originally created as a web application for law firms to manage documents, witnesses, and facts in complex litigation cases, Litigation::Database can be used for many other purposes.

The idea is that end-users will use the web interface to upload and type in information in the form of objects of various types (e.g., documents, companies, individuals), where each object has properties (e.g., name, description) as well as relationships to other objects. Then, once this information is entered in a structured format, end-users will be able to run commands on objects, using a pull-down menu, in order to perform various information processing tasks.

For example, suppose users need to keep track of companies, the people who work for them, and related documents. In Litigation::Database, "Corporation," "Individual" and "Document" are all pre-defined object types. A "Corporation" has various relationships, like "President Of", "Employee Of," and "Articles Of Incorporation For." A user can create a "Corporation" called ABC, Inc. and set the "President Of" the Corporation to John Smith, who is an "Individual." The user can specify that Mary Jones and Fred Cooper are "Employees Of" ABC, Inc. The user can also upload a PDF file and set it to be the "Articles Of Incorporation For" ABC, Inc. Perhaps John Smith, in addition to being the president of the company, was also the author of the articles of incorporation. The user could set the "Author Of" the articles of incorporation to John Smith, the same object of type "Individual" that is the "President Of" ABC, Inc.

Moreover, each relationship is itself an object, so it too can have properties and relationships. For example, if Mary Jones was an employee of ABC, Inc. for a limited period of time, the "Employee Of" relationship can be edited to specify a date range.

Litigation::Database is intended to allow end users to enter information in a structured fashion using terms that are meaningful in the real world, while allowing programmers to process that information by writing "literate" code.

For example, a programmer could write a method that summarizes information about a company:

  sub summarize {
    my $company = shift;
    my $output = "The company is called " . $company->get_t() . "\n";
    foreach my $employee (@{$company->get_Employees_Of()}){
      $output .= $employee->{obj}->get_t() . " is an employee of "
        . $company->get_t() . "\n";
    }
    my ($articles) = @{$company->get_Articles_Of_Incorporation_Of()};
    if ($articles){
      my ($document) = @{$articles->get_Content_Of()};
      if ($document){
        use PDF;
        my $pdf = PDF->new($document->{obj}->get_pdf_filename());
        $output .= "The company's articles of incorporation are " . $pdf->Pages . " pages long.\n";
      }
    }
    $company->set_desc($output);
    $company->save();
    return;
  }

Applied to ABC, Inc., this method would rewrite the "Description" property of ABC, Inc. as:

  The company is called ABC, Inc.
  Mary Jones is an employee of ABC, Inc.
  Fred Cooper is an employee of ABC, Inc.
  The company's articles of incorporation are 23 pages long.

The object types benefit from inheritance. From the perspective of Perl, the object types mentioned in the above example include:

For example, the "Corporation" type inherits the properties and relationships of:

Relationships are similar. For example, the "President Of" relationship in Perl is a Litigation::Database::Object::Person::LegalEntity::ExecutiveOf::PresidentOf, which inherits the properties and relationships of:

Note that the "::" structure does not always imply inheritance. The Litigation::Database::Object does not inherit from Litigation::Database. Relationships (like Litigation::Database::Object::Person::LegalEntity::ExecutiveOf) do not inherit from their corresponding objects (like Litigation::Database::Object::Person::LegalEntity).

Using the same web interface that they use to upload documents and enter data, users can define their own object types, and specify what the available properties and relationships of those custom objects should be. Users can also modify and add subclasses of pre-existing object types.

Users who know how to program Perl can write methods for the new object types, and make these methods available to the end users as commands on a pull-down menu. The Perl code can be typed into the web interface (using EditArea), uploaded, or saved as a Perl module file on the server's file system.

User-defined object definitions and methods can be bundled into LD Packages, which can be exported and then imported into other Litigation::Database systems.

To get started using the Litigation::Database system, see the project web site. When you are ready to start writing methods, the most important object class to learn is Litigation::Database::Object. The Litigation::Database class does not often need to be used.

INSTALLATION

Instructions for getting the Litigation::Database the database and web application up and running on a Debian system are available at http://litigationdatabase.org.

INTERFACE

The methods of the Litigation::Database object are often called through objects of one of its subclasses:

Litigation::Database::Node

A "Node" is the low-level representation of a Litigation::Database::Object.

Litigation::Database::Relationship

A "Relationship" is a low-level representation of a Litigation::Database::Object::Relationship.

Litigation::Database::Job

A "Job" is a long-running task that is queued in the database and executed by ld-daemon(8).

Litigation::Database->new( \%argument_list )

The Litigation::Database constructor expects a hash reference as its only argument.

For example:

  my $db = Litigation::Database::Connection->login_new('01', 'jsmith', 'mysecretpassword');
  my $database = Litigation::Database->new({db => $db});

or

  my $database = Litigation::Database->new();
  $database->set_db($db);

The keys of the argument hash are:

db

If you have already made a connection to a database, supply the Litigation::Database::Connection object here. Has the same effect as the set_db method.

database

Supply a database ID (e.g., '00', '01', etc.) to make a connection to a database.

username
password
hostname

Optionally used in conjunction with database. These arguments specify the username, password, and hostname for accessing the System Database. (The System Database, which is created by ld-makenewdb(1), contains the information necessary for accessing one or more Object Databases.)

If your script can access the System Database without a username and password (e.g., through PostgreSQL peer authentication), you can leave these fields alone. You can ignore hostname if the System Database runs on localhost.

$database->connect_db( @args )

Creates a new Litigation::Database::Connection using the arguments @args and sets it using set_db.

$database->get_db()

$database->set_db( $db )

Gets/sets the Litigation::Database::Connection for accessing a Litigation::Database database.

$database->get_sys_username()

$database->set_sys_username( $username )

$database->get_sys_password()

$database->set_sys_password( $password )

$database->get_sys_hostname()

$database->set_sys_hostname( $hostname )

These accessors control the username, password, and hostname that is used to access the System Database. See the discussion of the constructor new, above, for an explanation of the significance of this database.

$database->package_name_of_id( $id )

Every object type defined in the Definitions Database

DATABASES

Litigation::Database stores information in at least three separate PostgreSQL databases.

System Database

The System Database, a Postgresql database by the name of docdb-sys, contains information needed to access all of the <Object Databases|/"Object Databases"> available to the host.

This database is created and updated by ld-makenewdb(1). Its contents can be edited manually if it becomes necessary to scale Litigation::Database beyond a single host machine. See SCALABILITY.

The System Database can be hosted anywhere. See the methods set_sys_username, set_sys_password, set_sys_hostname for more information. Access information can also be configured using the configuration variables sys_db_username, sys_db_password, and sys_db_hostname.

It contains two tables:

this_host_config

This table contains the non-secret information about each Object Database that the host can access.

database

The two-digit hexadecimal identifier for the Object Database. E.g., 00, 01 . . . 09, 0a, 0b . . . 0f, 10 . . . ff.

connect_string

The DBI connection string to use for the Object Database. E.g., dbi:Pg:dbname=docdb01.

fs_root

The path to the upload directory for the Object Database. E.g., /usr/share/litigation-database/docfsroot/01.

topnode

The integer ID of the origin object of the Object Database. The default start page of the web application displays this object and its children.

Objects that do not descend from the topnode object or the worknode object will be deleted by the garbage collector (see the collect_garbage method).

worknode

The integer ID of the workspace object of the Object Database. The default start page of the web application displays this object and its children.

Objects that do not descend from the topnode object or the worknode object will be deleted by the garbage collector (see the collect_garbage method).

passwd

This table contains the secret information about each Object Database that the host can access.

For security, it is important that the web server child processes cannot read this table. In the web application, mod_perl reads the information in this table while Apache is still running as root. Then Apache forks into child processes that run as www-data. The usernames, passwords, and hostnames for accessing the Object Databases are stored in inside-out objects within the Litigation::Database::Connection class and are only provided to authorized code.

database

The two-digit hexadecimal identifier for the Object Database. E.g., 00, 01 . . . 09, 0a, 0b . . . 0f, 10 . . . ff.

username

The username for PostgreSQL authentication with the Object Database.

password

The password for PostgreSQL authentication with the Object Database.

hostname

The hostname of the Object Database. Set to 127.0.0.1 by default for databases stored locally.

dbname

The name of the PostgreSQL database for the Object Database. (The default name is docdb01 for database 01.)

Definitions Database

The Definitions Database is a Object Database just like any other, so it can be accessed and edited through the web interface. It is special because it is used for defining the objects and their inheritance structure.

The ld-makeobj(1) program reads the information from this database and writes the Perl module files for Litigation::Database::Object and its subclasses.

Every Object Database depends on the definitions in a unique Definitions Database. Every object type has a unique integer ID, which comes from the Definitions Database. So, for example, Litigation::Database::Object::Fish might have ID number 4320 in one organization's implementation of Litigation::Database, but ID number 24041 in another organization's implementation. See SCALABILITY for more information.

Object Databases

Each server running Litigation::Database can access up to 256 distinct Object Databases, all of which share a common Definitions Database. The server knows how to access these databases from the information (username, password, and hostname) stored in the System Database.

Object Databases are identified by a two-digit hexadecimal ID (00, 01 . . . 09, 0a, 0b . . . 0f, 10 . . . ff). To create a new database, run ld-makenewdb(1). To access a database by its two-digit ID, use a URL like https://yourcompany.com/LD/02 (for database 02). For more friendly URLs, see Litigation::Database::Object::LDStartPage.

Each Object Database is entirely separate from other Object Databases on the same server. The permissions are different (different LD User objects, different passwords). Relationships cannot be created across databases.

On the back end, an Object Database is a PostgreSQL database, typically with a name like docdb01.

WEB APPLICATION ARCHITECTURE

To get started using the Litigation::Database system, follow the installation instructions on the project web site. This section explains how the web application works, for those who are interested.

Apache

The Apache configuration is included into a site's VirtualHost configuration:

  Include /etc/litigation-database/apache.conf

When Apache starts, it loads /usr/share/litigation-database/startup.pl while it is still running as root and before it has forked into several processes.

The startup script uses Litigation::Database, which in turn loads all of the objects under it.

The script next runs Litigation::Database::Includes::initialize(), which loads user-defined modules located under Litigation::Database::Custom and Litigation::Database::Packages. It also initializes utility functions.

Then it runs Litigation::Database::Connection::load_databases(), which connects to the System Database and copies information from its tables into inside-out objects. The script is able to connect to the System Database through one of two mechanisms:

  1. Peer authentication (where "root" is a "role" in PostgreSQL that can connect to the System Database and can SELECT the contents of its tables).
  2. MD5 authetication using the configuration variables sys_db_username, sys_db_password, and sys_db_hostname, stored in the secret configuration file, /etc/litigation-database/litigation-database-private.conf, which should be readable by root only. See Litigation::Database::Config.

Then, the script initializes its connection to memcached(8). Memcache is necessary for coordination among the Apache processes that serve web requests. See Litigation::Database::Connection::initialize_memd_counters().

Next, the script updates each of the Object Databases with information from the Definitions Database. See Litigation::Database::Connection::update_it_tables().

Next, the script erases from memory any usernames and passwords it read from the secret configuration file.

Apache then forks into processes that run as www-data, which wait for requests.

Two mod_perl modules handle requests for Litigation::Database:

Litigation::Database::Dispatch

This module, tied to the relative URL /LD-Dispatch, dispatches page load requests to several other modules, including:

Litigation::Database::Help

Serves help pages for objects.

Litigation::Database::RevPage

Serves web pages for reviewing and marking up the pages of documents.

Litigation::Database::ServeContent

Serves dynamically created content.

Litigation::Database::ServeFile

Serves static files stored in the upload directory.

Litigation::Database::ServePage

Used by Litigation::Database::RevPage to serve page images as PNG files.

Litigation::Database::Start

The initial page that gets loaded when users access Litigation::Database through a URL like https://mycompany.com/LD/01 or https://mycompany.com/LD/ABC-investigation.

Litigation::Database::TransRev

Serves web pages for reviewing and marking up transcripts.

Litigation::Database::UploadMany

Handles uploads of files through the included Java upload applet.

Litigation::Database::Upload

Handles uploads of files through the web browser.

Litigation::Database::RequestHandler

This module responds to AJAX requests from web browsers. It authenticates requests using a session ID that is stored as a cookie. The web browser sends JSON-encoded queries and commands as POST requests to /LD-R, and the server responds with Javascript code that gets executed.

ld-daemon

Since it is inconvenient to have long-running processes execute in response to web server requests, the daemon ld-daemon(8) runs in the background, waiting for a command to come through on its socket. When a command comes in, it forks a process to handle the command. Each Object Databases has a job queue, and ld-daemon(8) will check the queue and execute each job.

Sphinx

For full-text searching, Litigation::Database uses the Sphinx Open Source Search Server. It runs Sphinx independently of any other Sphinx configuration you may have on your server. The configuration file is located at /etc/litigation-database/sphinx.conf, and is created by ld-sphinxconfig, which is run automatically by ld-makenewdb.

Sphinx consists of two commands:

indexer

The indexer(1) command runs from cron(8). Every five minutes, indexer(1) updates the "delta" indices (e.g., index docdb01delta for database 01), and every 24 hours, it does a full index of each database (e.g., index docdb01 for database 01).

searchd

The searchd(1) daemon runs in the background and waits for search requests. Litigation::Database uses Sphinx::Search to communicate with searchd(1).

SECURITY

Web access security

Access to the web interface is password-protected. Plain text passwords are neither stored in the database nor transmitted over the network.

Security is enhanced by using HTTPS (port 443) instead of HTTP (port 80) for the web application.

Object-level security

Litigation::Database provides a layer of access control that mimics the Unix permissions system. Each object and relationship stored in the database has an owner ID, a group ID, and permissions flags (read, write, and execute) in octal format. Owner IDs correspond to Litigation::Database::Object::LDUser objects. Group IDs correspond to Litigation::Database::Object::LDGroup objects.

In the web application, there is a pull-down option called "Security" for every object, which allows users to change the group and the read, write, and execute permissions for the owner, group, and everyone else.

This way, users can create private objects that only they or a specific group of people can see. Users can create read-only objects to prevent the unauthorized changing of information in an object.

Users can also set "Security Defaults" for their session from any Litigation::Database::Object::LDDashboard object, so that any objects they create during the session will have the specified group and permissions.

When you design your own objects, you can specify "Default Permissions" for an object type, so that any object of that type that is created will have those permissions by default.

When you create an Object Database using ld-makenewdb, a Litigation::Database::Object::LDUser named "root" is created. This user is designated a superuser. Other superusers can be created with ld-makepasswd. When you log into the web interface as a superuser, you will be able to create new users from any Litigation::Database::Object::LDDashboard object.

Security against bad Perl code

Litigation::Database allows users with access to the Definitions Database to write their own methods. It also allows users with superuser access to an Object Database to write their own modules using Litigation::Database::Object::PerlModule objects, the source code for which is stored as a subclass of Litigation::Database::Custom.

This code is loaded into Perl whenever Apache restarts. Obviously, this poses a number of security risks. There is not much that can be done about malicious programmers who have access to code. If you don't trust the people who have permissions to write code for Litigation::Database, you should not give them those permissions.

That said, the structure of Litigation::Database is structured in order to discourage programming practices that might have unintended adverse effects. User-defined methods cannot access the passwords to PostgreSQL Object Databases or the DBI connections to those databases (at least not without being very sneaky). Litigation::Database uses inside-out objects, which forces programmers to use accessors.

CONFIGURATION AND ENVIRONMENT

See litigationdatabase.conf(5).

SCALABILITY

The resource load of a Litigation::Database installation can be spread out among multiple machines.

The default configuration is to host the following functions on a single machine:

PostgreSQL server for the System Database
PostgreSQL server for the Definitions Database
PostgreSQL server for each Object Database
File server for the uploaded files of each Object Database
memcached(1) server
Apache processes (mod_perl)
ld-daemon(8) background process
sphinxsearch indexer, launched periodically by cron(8)
sphinxsearch search daemon

Each of these functions can be farmed out to a different machine, and each of the PostgreSQL Object Databases could be hosted on a different machine. There must be a single centralized memcached(1) server, and no more than one server for each PostgreSQL database, but there can be multiple servers simultaneously running Apache to handle requests from web browsers, multiple servers running background jobs with ld-daemon(8), multiple servers doing sphinxsearch indexing, and multiple servers running the sphinxsearch search daemon searchd(1). A load balancer like haproxy can be set up to distribute requests for Apache (port 443), ld-daemon(8) (port 52152 by default), and searchd(1) (port 52154 by default). Each Object Database has a single file storage directory, but this can be placed on a network file system accessible to all the machines running Apache, ld-daemon(8), the sphinxsearch indexer, and the sphinxsearch search daemon searchd(1).

See Litigation::Database::Config for the configuration variables you would need to change to split these functions across multiple machines. See also the data table definitions in System Database; you can edit these tables manually if necessary, for example if you wish to move PostgreSQL databases from one server to several servers.

Note that every Object Database depends upon a unique Definitions Database. For Perl code to access a Object Database correctly, it must load Litigation::Database::Object Perl modules that were created by ld-makeobj(1) from the Definitions Database associated with the Object Database. This is because every object in an Object Database has an integer object type, and the integer corresponds to an index in a table in a particular Definitions Database. Two organizations may both use the object type Litigation::Database::Object::Fish, but in the first organization's system it will have the integer ID 1402, while in the second organization's system it will have the integer ID 4278. The two PostgreSQL Object Databases could not simply be moved to the same server. If you need to move data between servers with different Definitions Databases, see Litigation::Database::Object::Exporter and Litigation::Database::Object::Importer.

Considerations for scaling

Litigation::Database does not make heavy use of memcached(1). The mod_perl handler calls the memcached server on every request, but only to read an integer.

If multiple machines are used to implement the components of Litigation::Database, firewalls should be set up to allow necessary communication among the components. Between the browser and the server, only port 443 (for HTTPS) or 80 (for HTTP) needs to be open. Apache needs to communicate with postgres(1) on port 5432, memcached(1) on port 11211, ld-daemon(8) on port 52152, and searchd(1) through port 52154. In addition, ld-daemon(8) needs to be able to communicate with postgres(1), memcached(1) and searchd(1).

The major bottleneck of Litigation::Database's scalability is the requirement of a single central server for an individual PostgreSQL Object Database. A heavily-used Object Database might receive SELECT, UPDATE, and INSERT requests at a rate high enough to cause slow-downs. However, technology for PostgreSQL Replication, Clustering, and Connection Pooling is improving, and if you can get this technology working to replicate and load-balance the PostgreSQL database of an Object Database, you may see a speed improvement.

AUTHOR

Jonathan Pyle <jpyle@litigationdatabase.org>

LICENCE AND COPYRIGHT

Copyright (c) 2007-2014, Jonathan Pyle <jpyle@litigationdatabase.org>. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.