Article

mar2002.tar

Web-Enabled Filesystem-Based Databases

Leo Liberti

Constructing a Web-enabled database is advantageous for many reasons. Virtually all techniques for implementing product catalogs, shopping carts, and even portal sites rely on a database containing the information and some form of dynamic Web page construction, be it ASP, PHP, or CGIs. The downside to having a Web-enabled database is that it is difficult to set up and administer. The usual options are:

Buy a commercial all-in-one product (e.g., Oracle).
Contact a specialized computer services firm that will take care of the job.
Dowload, configure, install, and administer freely available software packages (e.g., Apache, PHP, MySQL), and make them work together seamlessly.

The first solution circumvents having to know all the nitty-gritty details of systems administration, but it still requires quite a lot of technical involvement (using Oracle or other commercial software, however "graphical" the interfaces may appear, is usually not an easy task). Furthermore, commercial software costs money. The second solution simplifies administration, but it is usually very expensive. The third solution costs no money at all, apart from the cost of the hardware. However, it is not easy to set up such a system from scratch. Although there are plenty of resources (on the Web and elsewhere) that teach how to do this, admins may not have the time or the inclination to spend hours or days reading technical manuals.

Considerations

The worst problem in setting up a Web-enabled database exclusively based on free software is that it requires a broad range of technical skills:

A systems administrator to set up and administer the operating system (e.g., Linux) and make the Web server, database server, and server-side scripting modules work together.
A Webmaster to properly compile, configure, and maintain the Web server (e.g., Apache).
A DBA (database administrator) to properly compile, configure, and maintain the database server (e.g., MySQL).
A programmer to write server-side scripts (e.g., PHP) or CGI programs.
An HTML designer to write the HTML pages.

Because small firms can rarely afford to employ all of the professionals described above, the pressure will be put on the systems administrator to tackle all of the tasks. Because operating systems (especially UNIX-like ones) represent data in the form of files, systems administrators are usually very skilled at manipulating files and directories. However, when manipulating databases (creation, data input, searches, output data representation), junior administrators may encounter more difficulties. Furthermore, there are cases when a full-fledged database is either overkill or just not the best way to encapsulate the data. This is the case with product catalog files, for example, where each product needs an image and a description.

With these issues in mind, I propose a simple filesystem-based database model with the necessary PHP scripts to generate the dynamic HTML pages on the fly. One disadvantage to using files and directories instead of a proper database system is the unsuitability for dealing with large quantities of records, especially where each record field is small. The main advantage is that inputting data into the system can be as simple as copying or moving files around.

Overview and Requirements

The filesystem-based database I describe here is based on the principle that a directory represents a record and a file represents a field. This is a good data representation for databases where each record contains fields with variable (and large) sizes. Because a product catalog, for example, usually contains product images and product descriptions for each listed product, it fits the requirements well. Thus, our sample implementation will be a product database. For this example, we assume that a company that sells old computer hardware and systems is setting up the database. The database records simply consist of product ID, name, price, image, and description. The software requirements are:

1. Linux OS (or any other OS where Apache and PHP can be installed)

2. Apache Web server

3. PHP Web-scripting engine

(Installation and configuration of Linux, Apache, and PHP are outside the scope of this article.)

Filesystem DB Administration

We'll start by creating the database:

cd /home/www
mkdir products

Note that the parent directory of products must be in the document paths of the Web server. Assume that the Apache document root directory (corresponding to the URL http://127.0.0.1/) has absolute path /home/www, and the products directory has absolute path /home/www/products.

Next, we must input data relative to a product. Our example company just collected and refurbished an old SparcStation 1 and wants to insert it in its product catalog. A picture has been saved in ??/images/sparcstation1.jpg, a thumbnail image has been created in ??/images/sparcstation1.gif, and a brief description of the product has been written in the (ASCII) text file ??/descriptions/sparcstation1.txt. To find the next available product ID (which is assumed to be a number starting from 0 and incremented each time a product record is added to the database), we save the following lines to a shell script getnewID and make it executable. Typing ./getnewID will then return a new product ID:

  #!/bin/sh
  # getnewID
  cd products
  [ "`echo *`" == "*" ] && echo 0 || echo * | wc -w | sed -e 's/ //g'

We can now insert a new (empty) record by creating a subdirectory named sparcstation1-ID (where ID is the new product ID found by getnewID) under the products directory. This can be automated by running the following shell script with argument sparcstation1:

  #!/bin/sh
  # newrecord
  if [ "$1" == "" ] ; then
    echo "error: syntax is 'newrecord recordname'"
  else
    cd products
    NEWID=`../getnewID`
    DIR="$1-$NEWID"
    mkdir $DIR >/dev/null 2>&1 || ( echo "error: record exists" )
    [ ! -f $RECORDNAME/name ] && echo $1 > $RECORDNAME/name
    [ ! -f $RECORDNAME/id ] && echo $NEWID > $RECORDNAME/id
  fi

Because in this example the database had just been created, the product ID referring to the product sparcstation1 is 0. Thus, the directory is sparcstation1-0.

The product image, thumbnail, and description files are inserted simply by copying them:

  cp ~/images/sparcstation1.jpg products/sparcstation1-0/
  cp ~/images/sparcstation1.gif products/sparcstation1-0/
  cp ~/descriptions/sparcstation1.txt products/sparcstation1-0/

The price can be set (e.g., to 100 pounds) manually by typing:

echo 100 > sparcstation1-0/price

Finally (and optionally), we can make an HTML page with additional information for each product by creating an index.html file with appropriate content within each directory. The whole process can be automated by modifying the above script (newrecord) to include the copying of files and price setting.

Database Display by PHP

Next, I will present an HTML page containing a very simple PHP script that displays the whole product database. Here is the HTML code for the page design:

 <html>
  <head>
    <title>Old Computers and Hardware</title>
  </head>

  <body bgcolor="#000040">

    <table width="100%" border="0" cellspacing="2" cellpadding="2">

      <tr>
        <th bgcolor="#5000BB" colspan="3" align="left">
          <font color="#9090FF" size="6">
            <i>Products</i>
          </font>
        </th>
      </tr>

Here is the PHP script (commented):

1. initialization;

2.
3.     <?php
4.       $pdir = "/home/www/products";
5.       $d = dir($pdir);
6.       $i = 0;

7. cycle over the directory entries:

8.
9.       while($entry = $d->read()) {

10. skip entries . and ..

11.
12.         if(strcmp($entry, ".") != 0 && strcmp($entry, "..") != 0) {

13. display three products on each table row:

14.
15.            if ($i % 3 == 0) {
16.                echo "<tr>\n";
17.            }
18.            // cell
19.            echo "<td bgcolor=\"white\">";

20. find product name and ID from directory name:

21.
22.            $id = strrchr($entry, "-");           
23.            $name = substr(strrev(strchr(strrev($entry), "-")), 0, -1);

24. display product name:

25.
26.            echo "<h3>";
27.            echo "<a href=\"".$pdir."/".$entry."/\">".$name."</a>";
28.            echo "</h3>";

29. display thumbnail image:

30.
31.            echo "<a href=\"".$pdir."/".$entry."/".$name.".jpg\">";
32.            echo "<img src=\"".$pdir."/".$entry."/".$name.".gif\">";
33.            echo "</a><br>\n";

34. display description:

35.
36.            readfile($pdir."/".$entry."/".$name."-briefdesc.txt");

37. end the row every three products:

38.
39.            echo "</td>\n";          
40.            if ($i % 3 == 2) {
41.                echo "</tr>\n";
42.            }
43.            $i++;
44.         }
45.       }

46. close the directory for reading:

47.
48.       $d->close();

49. end the PHP script:

50.
51.     ?>

52. end the HTML page:

53.
54.     </table>
55.     
56.     <address>
57.       <a href="mailto:liberti@iris-tech.net">Leo Liberti</a>
58.     </address>
59.   </body>
60. </html>

Database Query

To perform a query on the filesystem-based database, just run the grep command with a few bells and whistles. The following shell script (search.sh) looks for a regular expression in a database directory and returns an HTML-formatted list:

#!/bin/sh
# search.sh: perform a regexp search in a database directory
if [ "$1" == "" -o "$2" == "" ]; then
    echo "error: syntax is $0 search_regexp db_directory"
    exit 2
fi

cd $2

echo "<ul>"
for i in 'find . -exec grep -isl $1 \{\} \;' ; do
    DIRFILE=`echo $i | cut -b 3-`
    DIR=`echo $DIRFILE | cut -d '/' -f 1`
    FILE=`basename $i`
    if [ $FILE == name ] ; then
        URL=$2/$DIR/
        ITEM=`cat $i`
    else
        URL=$2/$DIRFILE
        ITEM=$DIRFILE
    fi
    echo "<li><a href=\"$URL\">$ITEM</a></li>"
done
echo "</ul>"

Next, prepare a simple HTML form (search.html) that asks the user for a search expression. The form passes the input data to the PHP script search.php in the variable $s:

<html>
<head>
<title>catalogue search</title>
</head>
  <body bgcolor="#000040">
    <table width="100%" border="0" cellspacing="2" cellpadding="2">
      <tr>
        <th bgcolor="#5000BB" align="left">
          <font color="#9090FF" size="6">
            <i>Catalogue Search</i>
          </font>
        </th>
      </tr>
      <tr>	
        <td bgcolor="#BBBBBB">
          <form method="get" action="search.php">
            Input search term: <input name="s" size=50 maxlength=200 value="">
            <input type="submit" value="Search">
          </form>
        </td>
      </tr>
    </table>
    <address>
      <a href="mailto:liberti@iris-tech.net">Leo Liberti</a>
    </address>
  </body>
</html>

Finally, run the shell script search.sh above from within the PHP script search.php:

<html>
  <head>
    <title>products catalogue search</title>
  </head>
  <body bgcolor="#000040">
    <table width="100%" border="0" cellspacing="2" cellpadding="2">
      <tr>
        <th bgcolor="#5000BB" align="left">
          <font color="#9090FF" size="6">
            <i>Database Search for term: <? echo $s ?></i>
          </font>
        </th>
      </tr>
      <tr>
         <td bgcolor="white">
<?php
      /* search the filesystem-based DB for term in $s */
      $pdir = "products";
      $cmd = "./search.sh " . $s . " " . $pdir;
      $ret = 0;
      System($cmd, $ret);
?>
         </td>
       </tr>
    </table>   
    <address>
      <a href="mailto:liberti@iris-tech.net">Leo Liberti</a>
    </address>
  </body>
</html>

Conclusion

I have presented a scheme for constructing a Web-enabled database that circumvents using a true database engine, thus simplifying the administration. The implementation is based on the principle that a record is represented by a directory and a field is represented by a file. This makes the scheme very useful in cases where the fields contain large amounts of variable-sized data, such as images or text descriptions. I have illustrated a simple PHP script, which creates a Web page displaying the contents of the whole database. There are countless improvements to this simple scheme, and all or most of them should be easy to implement for sys admins with a good command of shell scripting and even a modest knowledge of PHP (or other CGI-like) Web scripting. A few other options include:

Implementing shell scripts to query the database according to some logical test.
Implementing a Web-based database search with PHP.
Implementing a security scheme based on the UNIX file permissions so that only certain users have access to changing the records. (Users accessing the database through the Web can be accounted for by having Apache running under a special user, such as wwwrun). Visit http://www.ipnos.co.uk/products.php for a working implementation of this scheme.

Leo Liberti has graduated in Mathematics from Imperial College, London, in 1992. He holds an M.Sc. in Mathematics from Turin University, Italy. He is now a research assistant and part-time systems administrator at Imperial College; he is also the Technical Director at IrisTech, Como, Italy, an Italian firm that supplies customers with Web-based and electronic services.