Cover V06, I04
Article

apr97.tar


Implementing a CD-ROM Jukebox Solution

David T. Caruso

Shared data access and data storage solutions have evolved from the era of tape and microfiche to powerful magnetic and optical solutions. And, although support for networked storage within the UNIX environment is not new, the emergence of media, such as CD-ROM and CD-R, now makes it possible to provide on-line and near-line access to hundreds of files across the network within seconds. When evaluating networked storage solutions, you will need to consider the performance and integration of each system, its bandwidth requirements, and its upgrade path. First, however, you should determine the need of users, understand the storage hierarchy, and use it to determine the type of storage required by your organization.

Determining User Needs

The first step in building a complementary storage combination for the UNIX network is understanding the end-users applications. By understanding these applications, you can better identify which combination of technologies (on-line, near-line and/or off-line) will be optimal for your organization, For example, users that need to access many small- to medium-sized files throughout the day with regular frequency will likely require a different solution than users running mission critical applications that must have information queued and ready to go.

Understanding and Building a Storage Hierarchy

Storage hierarchy is a way to view complementary storage solutions by looking at storage in terms of on-line, near-line, and off-line. In many environments, for example, up to 85% of all data on the network can be archival, with the remaining 15% split between on-line and near-line. As such, the storage hierarchy provides a means to mix and match storage solutions to build a system that optimizes resources and data delivery and minimize cost outlays. Generally, storage products are evaluated in terms of how quickly they deliver data, how much storage capacity is required, and whether the storage device is on-line, near-line, or off-line. You should also determine how often the data is accessed and whether it is static or dynamic in nature. Static data is stored on secure media, which is accessed, not altered. Dynamic data is stored on systems that provide read/write access and possibly even redundancy.

Storage Types and Options

Off-line storage provides a mechanism for reading and writing archived data that is retrieved infrequently. Paper storage and microfiche was once considered the only viable option for infrequently accessed data. Today, many IS managers rely on tape and WORM technology to archive and backup data. Yet tape backup does not allow the user community access to stored data. And, although WORM and 5 1/4 MO WORM technology is an excellent medium to store vast quantities of data, it's expensive and largely proprietary. IS teams adopting this technology risk being locked into a single vendor's hardware/software solution and/or a limited upgrade path. With the advancements in CD technology, including the secure nature of the media itself and standards (ISO 9660, Rock Ridge extensions, etc.), increasing amounts of off-line data storage are now managed by CDs.

CD storage provides efficient data retrieval times and does not suffer from the volatility problems of tape or magnetic hard drives. In fact, it is considered one of the most resilient storage media available.

At the opposite extreme, on-line storage solutions provide real-time information access for data that is accessed frequently, by multiple people, and is updated and changed regularly. On-line storage primarily consists of two storage types: hard drive and dedicated optical drives such as CD-ROM or WORM/MO drives. Archiving data on the hard drive provides exceptional retrieval speed to the user but is expensive, costing approximately $0.40 per megabyte of storage. Additionally, if the hard drive fails and is not part of a redundant array, the integrity of the stored data is jeopardized.

One example of an on-line optical solution is CD-ROM drives arranged in towers or racks where each drive holds only one CD. With such a solution, however, scalability can be a problem as the sheer number of CD-ROM drives quickly outstrips available SCSI IDs.

Near-line storage provides access to information that is accessed frequently but does not need to be queued in real-time, and near-line optical solutions such as CD jukeboxes can alleviate the scalability issues that exist with tower systems. CD jukeboxes are fully scaleable and highly reliable - and most importantly, deliver data across the network in 6-8 seconds. These systems utilize both CD-ROM and CD-R media, and can offer reading and simultaneous writing, storing, and high-speed access.

Network Storage within the UNIX Environment

Many of today's networking concerns focus on storage-related issues, the need for more data storage, the need for faster data storage, and the need for archival and backup storage that is both seamless and secure. CD-ROM and CD-R provide fast data delivery at a low cost with inherent benefits such as standard interchange formats, high capacity, low media and hardware costs, random access convenience, and media stability. The cost advantage of building a CD storage solution is significant - hundreds of gigabytes of data costing less than $.02/MB can be accessed virtually anywhere on UNIX network.

CD Information Systems

The popularity and widespread adoption of CD-ROM is largely due to the speed at which the technology is advancing. Read drive speed has increased from 1x (150 kilobytes/second) to 12x (1800 kilobytes/second) over the past 4 years. CD-R technology has followed a similar evolution as writable drives have increased in speed from 1x to 6x. The rapid growth of this technology has driven the price of hardware and media down to unprecedented levels. Additionally, the ISO group has defined a standard CD-ROM files system (ISO9660), which allows virtually every hardware platform access to data stored on CD.

For the purpose of evaluation, I chose to explore a near-line solution, the jukebox. The CD jukebox is a server-based technology that takes requests from clients and serves these requests over the network via the UNIX host. As with any client/server model, there are always two parts to the equation, the client application program or task, and the server hardware/software that fulfills the request of the client. In this model, the client resides on the user's desktop, and the server resides on the UNIX host. The jukebox, therefore, serves the CD data through the UNIX host and appears to the client as just another drive on the network.

Setting up the CDInformation System

Because I'm most familiar with the Mercury 40 Jukebox from NSM, these examples are based on this model running CD management software from iXOS JUKEMAN on a SUN UNIX server. Please note, however, that the functionality, configuration, and performance of alternative CD jukebox products, software, and server systems may be similar.

To configure this example, which is a SUN UNIX workstation host with the Solaris operating system 2.3, 2.4, or 2.5 for use with the NSM Jukebox and its integrated management software, you simply connect the CD jukebox (Mercury 40) via SCSI and serial connections and install the CD management software (iXOS JUKEMAN). I'll first discuss the main hardware and software components, and then provide the details of a typical SUN Solaris workstation hardware and software installation.

Hardware Components

CD jukeboxes are available from several manufacturers, each providing systems with varying capacity and configurations, with capacities ranging from as few as 6 CDs to more than 500 CDs. CDs are usually stored in fixed and removable CD racks and CD magazines, which are mechanically shuffled, via robotics, from storage location or a mail slot to the CD drive. Controlling jukeboxes and accessing data are done through the use of SCSI interfaces, with some vendors providing additional interfaces such as serial, parallel, and Ethernet interfaces. The use of multiple interfaces, such as dual SCSI interfaces, allows both the data and the control (robotic and drive) to have an individual path.

The Mercury jukebox in this example has a capacity of 150 CDs (three 50-CD magazines) per jukebox, with a SCSI-2 interface for data, a separate SCSI-2 port for the recorder, and an RS232 (Optional SCSI-2) interface for the robotics. For scalability, up to 7 jukeboxes can be connected to each SCSI port (software-dependent), and up to 3 jukeboxes can be cascaded from each RS232 port. Each jukebox can be configured with up to 4 CD drives. Note that the jukebox is also available in an Ethernet interface configuration, which alleviates the use of SCSI and serial connections.

SCSI Devices

Depending on your system, you may have several SCSI controllers for several SCSI buses. The SCSI bus has 8 IDs (0-7), with each device on a SCSI bus requiring a SCSI ID. Each SCSI ID can be split up into 8 Logical Units (LUNs). Most CD jukeboxes will require a unique ID per CD drive, and one for the CD changer. The jukebox in this example has 4 drives, and therefore requires 4 SCSI IDs. Because the CD management software I used was written to integrate easily with the jukebox, installing the software creates the necessary directories per SCSI bus. If LUNs are used, the software will also create the appropriate LUN number. Because most CD management software is quite sophisticated, alternative jukebox/management software combinations should be equally as easy to install, although additional steps may be necessary.

Serial Devices

The jukebox's robotics changer and disc selection, insertion, and removal are controlled through a serial line. Using a serial line saves SCSI IDs and, with the jukebox, allows the management software to fully exploit the features of the jukebox and its ability to execute several movements in different states simultaneously. The jukebox can satisfy 14 client requests for different CDs per minute.

Software Components

The software configuration consists of two major components, the CD server (cdnfsd daemon) and the CD writer (cdglow daemon).

The CD Server

The server, or file system server, interfaces with the hardware, controlling the CD jukeboxes, CD drives, and the CD discs. Generic SCSI drivers are used with (platform) portable software sections sending device-specific SCSI commands to the CD jukeboxes.

The file system server supports the native file system so that clients can view the CDs in the file structure they prefer. The server presents a hierarchical file system to both the host and clients, with a single file system in which all CDs are subdirectories of the root. UNIX (Network File System; NFS) and PC clients (pcNFS) access the server's file system using NFS, with the server accepting requests and replies compliant with the NFS protocol standards.

The server actually optimizes CD access via caching and optimized disc movements. The file and directory cache saves time in the tenths of a second; whereas optimizing CD disc movements saves 10 or more seconds. The server provides optimized CD disc movements by using parallel access to all CD devices and queues jukebox requests to optimize request scheduling, such as waiting for requests for a disc that is already mounted in a CD drive. For high-performance demands, replicated CDs can exist over several jukeboxes.

CD Writer

The CD writer allows users to create ISO 9660 file systems or raw data on recordable CDs. The writer accepts data from a premastered file, a raw partition, a CD-ROM driver, or a pipe, all of which means the CD file system can be generated without any further need for additional online disc space. Additionally, the CD writer (cdglow) allows previewing to test the writing process and a verification option that verifies data integrity immediately after the completion of the write process.

Hardware Installation

Most jukebox systems arrive in one carton and, with the accompanying installation instructions, require only minor assembly. After the jukebox is assembled, which involves removing transportation fixtures, checking and setting the line voltage jumpers (some are not factory set to 110 Volt), and some general installation procedures, the unit can be powered up. The jukebox in this installation arrived preset for SCSI IDs 3-6 (4 SCSI IDs) and RS232 ID of 00. The next steps involve verification of available SCSI IDs, and then connection of the SCSI and RS232 cables. Halt the SUN workstation, and at the "ok" prompt (hitting the Stop(L1)-a keys), identify the devices attached to the SCSI bus using the command: probe-scsi.

All SCSI devices currently attached to the SUN Workstation will be shown. If the jukebox SCSI IDs are in use, most jukebox manufacturers allow for reconfiguration of SCSI IDs by resetting jumpers or toggling DIP switches. If the SCSI IDs are not in use, then simply connect the proper SCSI and RS232 (9-pin D-SUB type) cables to the workstation and to the jukebox. Restart the SUN workstation and log in as root to continue with the software installation. The only remaining physical work is loading in the discs according the manufacturer's instructions.

Software Installation

The software comes on CD-ROM and is also available via the company's ftp server. To install the software from CD-ROM, stop the volume manager, mount the CD-ROM, create an installation area, then copy, restore, and uncompress the software, remove unnecessary files, and restart the volume manager:

# /etc/init.d/volmgt stop
# mkdir /cdrom
# mount -F hsfs -o ro \
/dev/dsk/c0t6d0s0 /cdrom
# mkdir /opt/jukeman
# cp /cdrom/jukeman.tar /opt/jukeman
# umount /cdrom
# cd /opt/jukeman
# tar -xvf jukeman.tar
# uncompress *.Z
# tar -xvf os_solaris.tar
# tar -xvf examples.tar
# rm *.tar
# /etc/init.d/volmgt start
#

The installation directory now contains all files necessary for the software, except the license files, device descriptions, and the configuration files. Templates for these files were extracted from examples.tar (as outlined within the iXOS Jukeman documentation).

The next step installs the SCSI driver and creates standard device names in /dev/iXOS_SCSI*:

# jmsetup

The cdnfsd and cdglow programs should be set to the root id, with set UID permissions:

# chown root cdnfsd cdglow
# chmod u+s cdnfsd cdglow

After receiving the correct license keys from the software manufacturer (if required), edit and create the license files server.lic and, optionally, writer.lic. Using an editor, I created the server.lic file in the installation directory using the following format. (Note: replace ***** with the specific license key.)

version=2
volumes=500
license*****

If you have a CD writer, edit and create the writer.lic file in the installation:

version=2
writer
license*****

Server Setup

Setting up the server requires that you define the visible file system views and define the devices to be used. Once this is accomplished, all CDs contained in the devices are visible in the file system view chosen.

Define the File System Views

The iXOS file system views are maintained in a single file system that contains all CDs as subdirectories. Depending on the CD management software you're using, the file system views may be slightly different. In either case, some of the clients on the network may need different views of the file system (e.g., most PCs require an 8.3 character filename.extension), the software allows users to define the various views of the server's exported file system. The standard root file system is normally named /view, with the different views named appropriately /views/pc for PC formats and /views/rr for Rock Ridge optional UNIX format.

The server requires a configuration file, servers.cfg, which contains descriptions of exported views, devices to be attached on server startup, creation of a CD directory cache file, and other server parameters. Edit and create the server.cfg file in the installation directory /opt/jukeman/, adding the following:

views { list { /views/pc /views/rr }
/views/pc { format { pc } discs { * } }
/views/rr { format { rr } discs { * } }
}

devices {
list { mercury }
mercury { startup { automatic } }
}

parameters {
loglev=5
}

dircache {
file ( dircache }
size { 25 }
}

The next step creates and exports the different views:

# mkdir /views
# mkdir /views/pc
# mkdir /views/rr

To export the CD file systems from the server, edit /etc/dfs/dfstab and add:

share -F nfs /views

Export the file system from the server with the command:

# shareall

Start the server software by:

# /opt/jukeman/cdnfsd
Define the Devices

Each server controlled jukebox needs an description file that informs the server as to the device type, SCSI and RS232 addresses, and robot ID. Edit the device description file, named mercury, in the /opt/jukeman directory inserting the following (assuming CD changer on serial line ttya, and SCSI bus 1, with the Mercury using SCSI IDs 3, 4, 5, and 6, and the jukebox using robot ID of 0):

device=mercury
drive=/dev/iXOS_SCSI1/3
drive=/dev/iXOS_SCSI1/4
drive=/dev/iXOS_SCSI1/5
drive=/dev/iXOS_SCSI1/6
robot=/dev/ttya
robid=0

Attaching the Devices

With the just created devices file, mercury, attach the devices to the server with the following command:

# cdadm attach mercury

Note that the NSM/ iXOS server setup defined above is an example of a minimum configuration and demonstrates the ease of an installation. Both NSM and iXOS have installation options available to ensure that flexible solutions can be applied to any type of installation and configuration.

Completing the Installation (Client Setup)

After installing the drivers, defining the views, starting the server, defining the devices and attaching them to the server, clients can mount the server's file system as they would mount any NFS drive, but with some additional mount parameters. The following command mounts the server's (servername) file system on the client's mount point /cd:

# mkdir /cd
# mount -o port=4027,timeo=33,retrans=14,soft servername:/views/rr /cd

I've briefly defined the various mount options:

port=4027: Uses port 4027 instead of port 2049 which is the standard NFS daemon port number. This allows the server to coexist with the standard nfsd.

timeo=33: Instructs the NFS client to retransmit a request if there is no reply after 33 tenths of a second (3.3 seconds).

retrans=14: Instructs the NFS client to automatically retransmit a request 14 times before it gives up and the file system access that caused the NFS requests fails.

soft: Instructs the NFS client to finally give up after all retransmits.

Understanding the CD Management Software

CD management software ranges from simple indexing software, which relies on third party databases to track information, to hardware-specific solutions that work at the kernel level and present the CD-ROM jukebox as a native file system (such as the iXOS JUKEMAN software I used). CD management software working at the file system level presents the entire archive as a standard file system, allowing the administrator to implement CD-ROM archives on Microsoft NT and/or UNIX platforms to serve as an extended archive solution. Most CD management software vendors work closely with the leading jukebox manufacturers to ensure that their software solution exploits the full capabilities of the jukebox. Although generic drivers and shareware drivers may exist, it is unlikely that they will be able to support unique capabilities of individual jukebox products.

Performance

Once you've set up a storage management solution for the network, and the users are simultaneously accessing information across the network, what kind of performance should you expect? This depends on several factors, not the least of which is the size, number, and type of files being accessed. With drive speeds currently at 8x, 12x, and beyond, a well-engineered system can actually deliver data across the network comparable to the network hard drive. This does not mean that an individual transaction is quicker via the jukebox, but rather that, at the end of the day, the network bandwidth determines overall performance. To calculate the performance of networked storage devices, you can look at the Time To Deliver Data, or TDD.

Using this example, it is important to consider that although it is possible to configure a jukebox with multiple drives, these multiple drives generally must be serviced (loaded) by a single robotics element. TDD, therefore, can be calculated as a function of three status conditions:

Media Exchange Time: The time for the robot to perform a full exchange of media.

Queue Time: The time for disc load and access after freeing the "robotics."

Data Transfer Time: The time for file transfer (File size/ Data transfer Rate).

Media Exchange Time as defined here specifically does not include access and spin up of the disc after load, or any other processes that can be done while the robotic mechanism is free to begin loading another disc.

As defined, the TDD of a jukebox is ultimately affected not by the speed at which the transfer occurs between the drive and the network, but by the speed at which the robotics mechanism can complete a full exchange of the media, freeing it to service another drive. In other words, whether the jukebox has 2 drives, 4 drives, or 8 drives, the single robotics mechanism becomes the linchpin to the system's ultimate performance.

Calculating Optimal File Size

The Optimum File Size (OFS) of a jukebox can be defined as the size of the file that allows the jukebox to achieve maximum operating efficiency. Because jukeboxes rely upon a single robotics assembly to deliver the discs to the drives, it is most important to maximize the efficiency of this action. This efficiency is highest when the first drive in the system has completed its data transfer at the precise moment that the robotics has completed the mounting of a disc into the last drive in the system. The OFS of a jukebox can be calculated using the following formula:

OFS = [ ( (N-1) x M) - Q] x D where:

N = the total number of drives

M = Media exchange time

Q = Queue Time in seconds

D = data transfer rate

Multiplying (N -1), which represents the remaining drives, and M, which represents the Robotics exchange time, results in the time it takes the robotics to load the remaining drives (in seconds). Subtract from this figure the Queue time (Q) in which no data is transferred. The result is the time remaining to transfer data, which is then multiplied by the data transfer rate in MB per second (D) to determine the OFS.

In a given multiple drive jukebox when file sizes are smaller than the OFS, the drives must wait for the robotics to become free to service them. Conversely, when the file sizes are larger than the OFS, the robotics must wait for the drives to complete their data transfer.

Conclusion

Networked storage solutions enable organizations to optimize resources and data delivery, to minimize cost outlays, and to maximize data availability, system scalability, and manageability. Because information access in any organization is split between dynamic and static data, many IS managers are now combining on-line, near-line, and off-line storage solutions to capitalize on the inherent benefits of streamlined capacity, cost efficiencies, and distributed information access. Combining a CD tower with a CD jukebox, for example, offers a range of expandability options while providing the most cost-effective data storage and archiving system available. Alone, a tower with 56 8x drives will provide on-line access to all discs with a cost of around $40,000, and a jukebox system with 4 8x drives and a 150-disc capacity will provide near-line (5-10 seconds) access to all 150 discs with a cost of around $15,000. Yet, if you combine the jukebox with a 7 8x drive mini-tower system, you have simultaneous on-line access to 11 discs and/or near-line access to 157 discs at a cost of around $20,000. Thus, you gain on-line and near-line access to three times as many discs as with a standalone tower and at half the cost.

About the Author

David Caruso is a Systems Administrator with Titan Linkabit. He has a strong background in UNIX and has experience with hardware and software from manufacturers including DEC, SUN, SGI, IBM, Cisco, Cabletron, and others. David has also authored and co-authored technical documents on security, network connectivity, and systems administration and has consulted for a number of Fortune 500 companies.