Article

Sybase Database Administration: Part 3

William Genosa

In the Nov./Dec. issue, part two of this series discussed database tables and supplied two programs that generate the sql required to re-create all the user tables for a given Sybase database. In this issue I focus on indexes and provide two more tools which together generate the sql required to re-create all the indexes that exist within a user database.

Database Indexing

Relational databases store data within objects known as tables. Indexes are also database objects and all database objects are placed on segments. Segments are logical partitions defined on physical disk devices (raw partitions).

A book contains an index to speed location of information within the book. A database administrator may build an index on a table to improve the performance of queries by decreasing the amount of disk I/O required to locate rows within a table.

There are various ways of building an index. Sybase uses the B-tree method, which consists of multiple index levels forming a tree-shaped structure, with the root being the beginning of the index. Index and data information are stored in pages of 2,048 bytes. When an index page becomes full, the next entry will split that index page into two pages, each half full. Page splitting can slow performance significantly since many other index pages may require changes as a result of the split page. To reduce page splitting, Sybase allows the database administrator to specify a fillfactor when an index is created; the fillfactor determines how much free space to leave in each page for growth.

Clustered and Non-Clustered Indexes

There are two types of Sybase B-tree indexes, the clustered index and the non-clustered index. Figure 1 illustrates a table with no index: data has been added to the table with no specific order. Figure 2 illustrates the same table after the creation of a clustered index. Figure 3 illustrates the table when only a non-clustered index were created on it.

A clustered index physically alters the table it is created on, so there can be only one clustered index per table. A non-clustered index does not alter the table it is created on, and so there can be more than one non-clustered index per table. A clustered index has pointers to data pages, where the data is stored in the physical order of the indexed attributes. Once the data page is located, it must be scanned to find a particular row. A non-clustered index contains pointers to each individual row of a table. A clustered index must reside on the same segment as the table it is created on. A non-clustered index can reside on a different segment than the table it was created on.

Planning an Index

Consider the matter of your application before creating indexes. If you will be performing queries on a table that will retrieve a range of data, you may want to create a clustered index either on the attributes you will be retrieving, or on the search augments of the where clause of your query. This improves performance, because after the first row is found, remaining rows are sequentially stored on the data page. Similarly, if your application does a lot of inserts and deletes, a clustered index can improve performance because the inserts will be distributed across many data pages. Without a clustered index, all inserts would be placed on the last data page (at the end of the heap). Because Sybase uses page-level locking, the last page would become a "hot spot" and cause performance degradation.

Because a clustered index must be located on the same segment as the table it is created on, you can move a table to a specific segment by specifying the segment in the index creation statement. If the index is not needed, it can be dropped after the table has been moved to the new segment.

Index creation also helps maintain uniqueness. Relational theory does not allow duplicate rows (tuples) within a table (relation). Each table should have an attribute or collection of attributes (composite key) which uniquely identifies every row. That unique attribute is known as the primary key. To ensure that the primary key is unique, a unique index can be created on the field or collection of fields which comprise the primary key. Even where a primary key may require a unique index, the decision as to whether that index should be clustered or non-clustered should be based on the nature of the transactions to be made against the table.

Normalization and Logical Design

Primary keys are cornerstones of normalization. Normalization is the process of logical database design which eliminates redundant data and ensures the data accurately represents its real-world entities. There are five rules of normalization, but most database designers only follow the first three. I present those three here along with two other laws required for a sound logical design.

First Normal Form

First Normal Form (1NF) requires that attributes be atomic. Since an attribute is a subset of a domain, and since Sybase domains are atomic (cannot be de-composed), all Sybase tables are in first normal form by definition.

Second Normal Form

Second Normal Form (2NF) requires that all attributes for a given relation (table) contain only information specifically about the primary key of that table (functional dependence).

Third Normal Form

Third Normal Form (3NF) requires that all attributes which are not part of the primary key be mutually independent. This means that non-key attributes cannot have any functional dependence on each other.

Entity Integrity

Entity Integrity states that a primary key cannot contain NULL values. A Null represents the absence of a value.

Referential Integrity

Referential Integrity states that a foreign key must be an existing value of a primary key of another relation. Any attribute of a relation which is a primary key in some other relation would be considered a foreign key.

The Index Creation Tools

The create.index program (Listing 1) will generate the sql required to re-create all the indexes for a table named in the first augment. The program is broken into five sections.

The first section of create.index sets up environment variables. Because this program is part of a suite of tools, it allows variables to be assigned via a configuration file executed with the dot command. If several programs in the suite are to be executed, only one configuration file would require editing. (Other tools from this suite include create.diskinit, in Sys Admin Sept./Oct. 1994, and create.table, in Sys Admin Nov./Dec. 1994.)

Section two simply checks for the existence of an augment. If no table name is specified, a message is displayed on the terminal describing the usage and the program terminates.

Section three retrieves each index name and type for the table named in augment one. Table names are stored in the system table, called SYSOBJECTS, of every database. The index name and type would be stored in SYSINDEXES. The index type is determined by the attribute indid of SYSINDEXES. The query in section three joins the two tables SYSOBJECTS and SYSINDEXES on the common attribute id, which is the primary key of SYSOBJECTS and a foreign key of SYSINDEXES. Output of the query is piped into a while loop which will remove any rows returned with an indid of zero. This is because a table with no clustered index will have an indid of zero. A table with a clustered index will have an indid of one. A non-clustered index will have an indid of two or greater. Because this process is only concerned with indexes, tables (indid of 0) are removed. Note that SYSINDEXES also has an attribute called segment. Any table with only a clustered index will have only one row in SYSINDEXES. The index and the table both occupy the same data pages and must therefore reside on the same segment. A non-clustered index would have its own entry in SYSINDEXES because it does not share data pages with the table. Non-clustered indexes can reside on a segment other than the segment where the table resides.

Section four of create.index is where most of the work is accomplished. For each index named in section three, a while loop queries the database for information on various index creation options, as well as the segment the index is located on. Unions provide the output of each option in a suitable format for processing by awk. Awk generates the sql to re-create each index with all the proper options and segment placement. SCCS key-word identifiers are also inserted in every header for software and version control purposes. In Sybase an index can contain up to sixteen attributes. The sql query in section four also has a while loop which executes the index_col function. This function returns each attribute of the indexes queried.

Section five of create.index removes the temporary work file created in section three.

create.index creates the indexes only for a single table. A second program, create.index.all, calls create.index for every user table in a given database. This allows the database administrator to generate the indexes for all tables in a user database without losing the ability to generate sql for index creation on a single table. create.index.all simply executes a query to retrieve all user table names, then runs create.index on each table name returned. These tools, along with those presented in the earlier articles in this series, give Sybase database administrators a means of automating some major management tasks.

About the Author

Bill Genosa is a UNIX Systems/Database Administrator with Harman Consumer Group, located in Woodbury, New York. He works with RS6000 computers on systems that serve some 150 users across the US and in Denmark and Singapore. You can reach Bill at 186 Bryant Avenue, Floral Park, NY 11001, or via email as wgenosa@attmail.com.