Article

Perl for the DBA

Anthony Taylor

Since I discovered Perl, I enjoy being a DBA. Before I discovered Perl, my reports were convoluted tangles of SQL, with subselects peppering the list of FROM tables, and interrupting the logical flow, with strange GROUP BYs, and confusing WHEREs. In many cases, my reports were constrained by the strictly linear flow of SQL. I dreaded a user would request a change to a report I'd thought finished. Now, everything is different. With the help of Perl, my SQL is broken into logical and simple blocks, all glued together with my favorite scripting language. Now my output is beautiful and clean, with the power of Perl's report formatting.

Perl cannot access databases by itself. It needs a little help from the Perl modules DBI (Database Interface) and DBD (Database Driver). While DBI provides database access, DBI packages up common database functions into one convenient bundle. There are many DBI drivers, each used to access a different database. So, before we dive into the DBI functions and how to use them with Perl, let's look at acquiring and installing both the BDI and DBD modules.

Installation

The DBI module and DBD drivers are in the CPAN module repository at http://www.perl.com/CPAN/CPAN.html. You will need the most recent DBI module and the DBD driver for your database. You can install multiple DBD drivers, so fetch as many as you need. As of this writing, there are DBD drivers for Adabas, Altera, DB2, Fulcrum, Illustra, Informix, Ingres, ODBC, Postgres (Pg), Qbase, Solid, Sybase, XBase, msql, pNET, and MySQL. There are almost certainly others.

For the examples in this article, I used DBI version 1.06 and an Oracle DBD. I downloaded DBI-1.06.tar.gz and DBD-Oracle-0.59.tar.gz. To unpack the code, I used gzip:

% gzip -dc DBI-1.06.tar.gz | tar xf -
% cd DBI-1.06

Once the source code is unpacked, building the module is simple.

% perl Makefile.PL       # This generates the Makefile

During the generation of the Makefile, the Perl script, Makefile.PL, will warn you of any possible incompatibilities. If the Makefile is successfully generated, the actual build of the module should go smoothly.

% make                   # This builds the module

This module should build with no problems. If the build fails, consult the README file included in the top directory of the source tree. After the module is built, test the module:

% make test

This will inform you of any problems with the finished module. The output of the tests will indicate how to correct any problems. If there are no errors, install the module. As root, type:

% make install
% cd ..
% rm -rf DBI-1.06

With the DBI module made correctly and installed, unpack and build all the DBD drivers.

% gzip -dc DBD-Oracle-0.59.tar.gz | tar xf -
% cd DBD-Oracle-0.59

Building the DBD is a bit trickier than the DBI. For the Oracle driver, you must be in an environment that will compile Pro*C programs; other environments will require special preparation also. Generally, you should be able to compile the driver if you are logged in as a database programmer or as the dba. Consult the README file in the top directory of the source tree for specific information.

% perl Makefile.PL

This process will build the Makefile, as with the DBI; also, any error messages should provide remedial actions as well. Once the Makefile is built, you can be confident your environment is ready. Continue building the driver as before:

% make
% make test

The tests will require access to the database. Check the README for additional directions on testing the driver. Once the tests are successful, run the install as root:

% make install

Install as many drivers as you need - the DBI can handle multiple drivers, and loads the appropriate drivers dynamically.

Architecture

The DBI module does not access databases itself. Instead, it relies on database-specific drivers to provide the database services - the DBD drivers. This presents a unified API for Perl scripts, while allowing access to a wide variety of databases. In fact, the DBI can access multiple databases simultaneously by loading multiple DBD drivers (see Figure 1).

The DBI proper consists of a Perl API and a switch. A program creates a connection to a database with the connect procedure, which specifies the data source; the DBI selects and loads the proper driver, and returns a "handle" to the database. The DBI routes any instructions accessing that handle directly to the proper DBD driver.

$dbh = DBI->connect('dbi:Oracle', $user, $passwd);

From this point on, $dbh always refers to a connection to an Oracle database, authorized as <user>/<password>. Every request to that database is addressed by this handle, either directly or by another handle. For example, the instructions:

$table_handle = $dbh->prepare(q{ SELECT * FROM all_tables });
$return_value = $table_handle->execute;
@row_array = $table_handle->fetchrow_array;

sends a query to the database (but does not execute it immediately), returns a handle to that query, uses the handle to execute the query, and fetches the first row into the Perl array @row_array. In this process, the DBI loads the Oracle driver, assigns handles, and passes requests back and forth from Perl to the DBD driver. The lion's share of the work is done by the DBD driver, and of course, the database.

In the example above, note the two types of values returned by the DBI functions, both scalar ($) and array (@). In reality, there are three types of values. The first scalar, called $table_handle in the example above, is a handle. This handle is used for further commands to the database; its actual value is meaningless. The second type of scalar is a simple return value (which I cleverly called $return_value above) used for detecting error conditions or indicating the number of rows affected by a database statement.

The third value, the array @row_array, is the first row returned by the database query; each table column is returned as an element in the array. Each subsequent call to $table_handle->fetchrow_array returns the next row; so a structure like:

while (@row_array = $table_handle->fetchrow_array) {
        &do_stuff;
}

fetches every row, one at a time, and does stuff for each row.

Commands

The list of DBI commands is somewhat daunting, so I've included them in a sidebar for reference (see "Commands").

Synthesis

Armed with the commands shown in the sidebar, we can tackle an actual project. Since Perl was originally designed as a report tool for a bug tracking system, it's only appropriate to create a bug tracking/work order database as our example Perl/DBI program. Building a full-fledged work order tracking system is beyond the scope of this article, but we can explore the basic data structures and study how DBI allows us to manipulate the database using the strength of Perl.

The simple work order database will require three tables: one for tracking people, one for tracking systems, and one for tracking work orders. A real work order system would have many more tables; but for the sake of simplicity, these three are sufficient.

The first rule of database management is "script everything". So, although database creation is a one-time process, we will automate table creation with the script shown in Listing 1.

Loading the DBI module with use dbi; is only the first step. After that, we must authenticate into the database, which gives us a database handle ($dbh). We then use that handle to execute the simplest of the DBI commands, do. The do command simply sends a statement to the database for immediate execution, and returns true for successful execution or false for failure. (Actually, it returns the number of rows affected, or undef if the statement failed. Statements that do not affect rows return -1 when successful and undef otherwise.)

Now that we have successfully created the tables, we can populate them with information. We'll forego the application logic and concentrate on self-contained subroutines that update specific tables. We will need a subroutine to insert a person into the database. This assumes the application has verified the person is not yet in the database, and has collected the pertinent information from the application user. See Listing 2 for the code to accomplish this.

Note the use of the Perl pass-by-reference feature to pass in the associative array, with the * operator. Since the associative arrays are very similar to a single row from a database table, they are a natural fit for most database functions. Also, they make clean and easily readable code. Inserting information into the other tables works the same way; the routines will look almost identical to the create_person subroutine above.

Now that we are able to put information into the database, how do we get information back out? For example, we might need a report of outstanding work orders, as shown in Listing 3.

Note the order of the array elements in the "lvalue" of the $sth->fetchrow_array command within the while conditional. The order matches the order of rows in the SELECT statement. Writing queries in this format keeps the code clean and very readable.

Using these simple techniques, grand database applications are possible. Perl provides the infrastructure, and the database provides the data organization and storage.

I use Perl's great CGI capabilities to provide Web-based database applications. I write all of my financial database reports in Perl, using the DBI and Perl's natural formatting capabilities. I script all of my data imports in Perl. The possibilities are fairly boundless; since the database carries the load, it's even possible to design data warehousing applications with Perl as the driving language. Remember, though, this article barely scratches the surface of the DBI's abilities. There are many functions not listed here. For further information, check these Perldoc resources:

% perldoc DBI
% perldoc DBD::Oracle     # (or Pg, Informix, etc)

Here are some other resources.

Some DBI tutorials:

http://tegan.deltanet.com/~phlip/DBUIdoc.html
http://eskimo.tamu.edu/~jbaker/dbi-examples.html
http://www.inlink.com/~perlguy/simple/
http://www.symbolstone.org/technology/perl/DBI/doc/tpj5/index.html

Generic Perl references:

http://reference.perl.com/

About the Author

Anthony Taylor is a systems administrator for Southeast Alaska Regional Health Consortium, in Sitka, Alaska. He is also their senior DBA. Any life he might have is squandered programming free software for the Gnome project.