Sybase Database Administration: Part 3
William Genosa
In the Nov./Dec. issue, part two of this series discussed
database
tables and supplied two programs that generate the sql
required to
re-create all the user tables for a given Sybase database.
In this
issue I focus on indexes and provide two more tools
which together
generate the sql required to re-create all the indexes
that exist
within a user database.
Database Indexing
Relational databases store data within objects known
as tables. Indexes
are also database objects and all database objects are
placed on segments.
Segments are logical partitions defined on physical
disk devices (raw
partitions).
A book contains an index to speed location of information
within the
book. A database administrator may build an index on
a table to improve
the performance of queries by decreasing the amount
of disk I/O required
to locate rows within a table.
There are various ways of building an index. Sybase
uses the B-tree
method, which consists of multiple index levels forming
a tree-shaped
structure, with the root being the beginning of the
index. Index and
data information are stored in pages of 2,048 bytes.
When an index
page becomes full, the next entry will split that index
page into
two pages, each half full. Page splitting can slow performance
significantly
since many other index pages may require changes as
a result of the
split page. To reduce page splitting, Sybase allows
the database administrator
to specify a fillfactor when an index is created; the
fillfactor determines
how much free space to leave in each page for growth.
Clustered and Non-Clustered Indexes
There are two types of Sybase B-tree indexes, the clustered
index
and the non-clustered index. Figure 1 illustrates a
table with no
index: data has been added to the table with no specific
order. Figure 2
illustrates the same table after the creation of a
clustered index.
Figure 3 illustrates the table when only a non-clustered
index were
created on it.
A clustered index physically alters the table it is
created on, so
there can be only one clustered index per table. A non-clustered
index
does not alter the table it is created on, and so there
can be more
than one non-clustered index per table. A clustered
index has pointers
to data pages, where the data is stored in the physical
order of the
indexed attributes. Once the data page is located, it
must be scanned
to find a particular row. A non-clustered index contains
pointers
to each individual row of a table. A clustered index
must reside on
the same segment as the table it is created on. A non-clustered
index
can reside on a different segment than the table it
was created on.
Planning an Index
Consider the matter of your application before creating
indexes. If
you will be performing queries on a table that will
retrieve a range
of data, you may want to create a clustered index either
on the attributes
you will be retrieving, or on the search augments of
the where
clause of your query. This improves performance, because
after the
first row is found, remaining rows are sequentially
stored on the
data page. Similarly, if your application does a lot
of inserts and
deletes, a clustered index can improve performance because
the inserts
will be distributed across many data pages. Without
a clustered index,
all inserts would be placed on the last data page (at
the end of the
heap). Because Sybase uses page-level locking, the last
page would
become a "hot spot" and cause performance
degradation.
Because a clustered index must be located on the same
segment as the
table it is created on, you can move a table to a specific
segment
by specifying the segment in the index creation statement.
If the
index is not needed, it can be dropped after the table
has been moved
to the new segment.
Index creation also helps maintain uniqueness. Relational
theory does
not allow duplicate rows (tuples) within a table (relation).
Each
table should have an attribute or collection of attributes
(composite
key) which uniquely identifies every row. That unique
attribute is
known as the primary key. To ensure that the primary
key is unique,
a unique index can be created on the field or collection
of fields
which comprise the primary key. Even where a primary
key may require
a unique index, the decision as to whether that index
should be clustered
or non-clustered should be based on the nature of the
transactions
to be made against the table.
Normalization and Logical Design
Primary keys are cornerstones of normalization. Normalization
is the
process of logical database design which eliminates
redundant data
and ensures the data accurately represents its real-world
entities.
There are five rules of normalization, but most database
designers
only follow the first three. I present those three here
along with
two other laws required for a sound logical design.
First Normal Form
First Normal Form (1NF) requires that attributes be
atomic. Since
an attribute is a subset of a domain, and since Sybase
domains are
atomic (cannot be de-composed), all Sybase tables are
in first normal
form by definition.
Second Normal Form
Second Normal Form (2NF) requires that all attributes
for a given
relation (table) contain only information specifically
about the primary
key of that table (functional dependence).
Third Normal Form
Third Normal Form (3NF) requires that all attributes
which are not
part of the primary key be mutually independent. This
means that non-key
attributes cannot have any functional dependence on
each other.
Entity Integrity
Entity Integrity states that a primary key cannot contain
NULL values.
A Null represents the absence of a value.
Referential Integrity
Referential Integrity states that a foreign key must
be an existing
value of a primary key of another relation. Any attribute
of a relation
which is a primary key in some other relation would
be considered
a foreign key.
The Index Creation Tools
The create.index program (Listing 1) will generate the
sql
required to re-create all the indexes for a table named
in the first
augment. The program is broken into five sections.
The first section of create.index sets up environment
variables.
Because this program is part of a suite of tools, it
allows variables
to be assigned via a configuration file executed with
the dot
command. If several programs in the suite are to be
executed, only
one configuration file would require editing. (Other
tools from this
suite include create.diskinit, in Sys Admin Sept./Oct.
1994, and create.table, in Sys Admin Nov./Dec. 1994.)
Section two simply checks for the existence of an augment.
If no table
name is specified, a message is displayed on the terminal
describing
the usage and the program terminates.
Section three retrieves each index name and type for
the table named
in augment one. Table names are stored in the system
table, called
SYSOBJECTS, of every database. The index name and type
would be stored
in SYSINDEXES. The index type is determined by the attribute
indid
of SYSINDEXES. The query in section three joins the
two tables SYSOBJECTS
and SYSINDEXES on the common attribute id, which is
the primary
key of SYSOBJECTS and a foreign key of SYSINDEXES. Output
of the query
is piped into a while loop which will remove any rows
returned
with an indid of zero. This is because a table with
no clustered
index will have an indid of zero. A table with a clustered
index will have an indid of one. A non-clustered index
will
have an indid of two or greater. Because this process
is only
concerned with indexes, tables (indid of 0) are removed.
Note
that SYSINDEXES also has an attribute called segment.
Any
table with only a clustered index will have only one
row in SYSINDEXES.
The index and the table both occupy the same data pages
and must therefore
reside on the same segment. A non-clustered index would
have its own
entry in SYSINDEXES because it does not share data pages
with the
table. Non-clustered indexes can reside on a segment
other than the
segment where the table resides.
Section four of create.index is where most of the work
is
accomplished. For each index named in section three,
a while
loop queries the database for information on various
index creation
options, as well as the segment the index is located
on. Unions provide
the output of each option in a suitable format for processing
by awk.
Awk generates the sql to re-create each index with all
the proper
options and segment placement. SCCS key-word identifiers
are also
inserted in every header for software and version control
purposes.
In Sybase an index can contain up to sixteen attributes.
The sql query
in section four also has a while loop which executes
the index_col
function. This function returns each attribute of the
indexes queried.
Section five of create.index removes the temporary work
file
created in section three.
create.index creates the indexes only for a single table.
A second program, create.index.all, calls create.index
for every user table in a given database. This allows
the database
administrator to generate the indexes for all tables
in a user database
without losing the ability to generate sql for index
creation on a
single table. create.index.all simply executes a query
to
retrieve all user table names, then runs create.index
on each
table name returned. These tools, along with those presented
in the
earlier articles in this series, give Sybase database
administrators
a means of automating some major management tasks.
About the Author
Bill Genosa is a UNIX Systems/Database Administrator
with Harman
Consumer Group, located in Woodbury, New York. He works
with RS6000
computers on systems that serve some 150 users across
the US and in
Denmark and Singapore. You can reach Bill at 186 Bryant
Avenue, Floral
Park, NY 11001, or via email as wgenosa@attmail.com.
|