Cover V06, I08
Article
Sidebar 1

aug97.tar


System Administration:SA = f(x)

Thom Garrett

Introduction

System administrators (SAs) usually work behind the scenes in a very important support role. SAs have the task to keep the system/network up and running with as little downtime as possible, the best performance possible, and with the latest and greatest technology possible. The SA job has also evolved over the years from administering a single server or LAN to much larger responsibilities such as administering Internet/Intranet services as well as, and in addition to, LANs and WANs.

This article touches on key aspects of System Administration. I like to visualize SA as a function of many variables, which tend to be intertwined and support each other in various ways. Developing these characteristics can lead to a well-rounded environment which will help you and your employer. To help you visualize SA as a function of many variables, I offer the following mathematical expression. There are no real values, good or bad.

SA Variables

u = Know your configuration and user base. I subscribe to the theory that the more you know about something, the better you are equipped to manage it. You need to be very familiar with all aspects of your system and network configurations. If you inherit a legacy system, find out as much as you can about it. For example, determine all third-party products, custom in-house applications and utilities, determine and learn the specs of all components. If it is a networked system, determine its function on the net, identify OS level and patches, become familiar with the average system load, etc. It is also a good idea to know your company's application or function for the system in question. Our company specializes in Signal Processing, hence I became sensitive to the engineers needs especially as they apply to performing real-time analysis and system simulation. Striving to determine the best way to serve your user base with existing resources always pays off.

b = Have a proven backup/recover plan. Backing up your system is a fundamental necessity. The trick is determining what you need to back up and at what frequency. The worst thing a user needs to hear from you is that the file needed for a very important demo in an hour, is on the backup which you did not perform last night. On the other hand, producing the file can have beneficial results. I have found it useful to have the ability to create an exact duplicate of a disk or directory from the backup tape(s). For example, for important systems, I structure my backups to do just that. Often I have found that the SA's backup is a subset of files. For generic systems, this may suffice, you can reload the OS from CD, apply the backup files and be on your way. However, for configurations that require high levels of customization, building a new disk using this method may not produce desired results. Another practice which I employ is to create mirror or duplicate boot disks for important systems. For example, I have a mirror disk or mirror system for our Internet gateway, NIS server, firewall, etc.

p = Performance. Performance is key. When working on any aspect of the computer or network, try to define the maximum performance you can reasonably achieve. You may not be able to achieve it, but you can strive for it. Trying to achieve good performance is likened to saving money or personal budgeting. To save money at home, you need to monitor all aspects of your expenses (ie., food, entertainment, mortgage/rent, etc.). After you determine and analyze your expenses, you start to see the results and save. So it is with performance, you need to monitor all aspects (i.e., system parameters, disk, memory, swap, network, etc.) before your performance improves. Usually, no one area will improve performance (unless something is misconfigured to begin with). Improving a little on a lot of areas will improve performance on the system.

i = Develop sound testing strategies and isolate problems. Many an hour is spent troubleshooting the fix or upgrade you just installed which was supposed to fix everything. Whenever possible, test the intended change to your current configuration before you actually apply it. For some situations testing may not be possible, so at least work it out on paper or in your mind. In the cases where it is possible, have a clear test plan, taking as many variables and test cases into account. If things go awry, have a recourse plan. Be able to back it out and start anew. Sometimes the problems which arise after an upgrade would have been caught by the x+1 test case (where x = the number of different tests performed). Before upgrading a recent gateway, I created a mock network with all the important components and tested various situations. When testing was complete, it was literally "plug and play" the new tested components into our live network. It was quite satisfying.

d = Disaster Recovery Plan. Have one! At the bare minimum send your backup tapes off-site. A more elaborate plan would be to develop a mirror site at another off-site location. Your plan should probably be somewhere in the middle. I recall when I first started working at PBS headquarters, we were recovering from a fire at the L'Enfant Plaza building in Washington, DC. The fire did some damage, but it seems the water did more. If it were not for the Disaster Recovery Plan PBS developed prior to the fire, they would not have been able to recover their processing capability. (BTW, they were also able to move into a brand new building in VA with new computing facilities.)

a = Connect with other SAs and keep current with technology. For the most part SAs need to work on their own and become self-reliant. However, often the problem you are attempting to solve has been solved before. There are many resources out there that can be very helpful. For example, SunManagers, HP-UX, and Linuxs mailing lists as well as numerous newsgroups have proven to be very useful and time efficient. If you are administering SUNs, SunSolve over the web is a resource full of information which is also easily queried. Another good resource is journals and magazines on the subject. They can be used to give you overviews and "how-to," as well as help keep you current with technology. It is important for SAs to keep up with technology. You are the ones who need to introduce the latest technology, where applicable, and determine why certain technology is unacceptable for your facility. For example, will you need to move to IPv6 and if so, when? Technology is your friend, however, beware, friends sometimes hurt you the most!

r = Organize notes/reports and obtain useful tools. SAs are bombarded with a battery of questions and problems at all levels in any given day. For example, "Do you remember the default output format for numbers for sed or was it awk you told me last year?" Chances are it was an on-the-fly solution you produced for this "one-time" problem. I found it useful to write such things down in a log book. This may seem obvious, however it is often overlooked and a discipline that takes time to develop. Even if the solution is simple, I am surprised at how many times something is forgotten because of lack of use. I use a distinctive hard-bound record book to record names, phone numbers, pin-outs, dip and jumper settings, general and to-do notes, diagrams, etc. However, be careful not to document too much. There is a point of diminishing returns which applies to documentation. Find an acceptable medium and stick with it. The extra time and trouble of writing things down will pay off. Along with a good notebook, various software and hardware tools have proven to be handy. Software tools such as tcpdump, sudo, etc. are stored in a util directory on my harddrive, both primary and backup and on tape. Hardware tools such as an RS-232 break-out box, specialty adapters, unusual cables types, etc., are part of my tool bag developed over the years. I did not think that I would ever need the "N"-to-"BNC" adapter stowed away in my tool bag. However, during a system install onboard a UK ship, we needed to run and connect an alternate GPS antenna to our system using this adapter, just before departure.

s = Don't forget about Security. Hackers couldn't care less about your good intentions and the hard work which you put into protecting your computing environment. There are all levels of hackers ready and willing to penetrate your site. Therefore, make sure you cover all levels of security. Remember to cover the basics. For example, enforce proper password policies, apply mandatory OS patches and other security patches (bug_track and CERT announcements), and enforce proper firewall rules. Run programs such as Crack and Passwd+ against your own system. Other good practices might be to log and monitor your network, know your environment, turn off unnecessary services on your gateway, and install a bastion host as a firewall. Be vigilant with your efforts against hackers, it will pay off in the long run.

k = Specific conditions which apply to your environment. There are probably conditions which are specific to your environment not covered in this article. Learn and study them fully and incorporate them into your other responsibilities.

t = Time. The time you have to accomplish all of this is t (in seconds). Develop good time management skills. Concentrate on the important and necessary items (all of the above) and shy away from the unimportant, unnecessary items. The value t is raised to the n because I have found, for the most part, when another person is added, the total productivity is greater than 2*t (depending on the persons involved). If you are interested, with n=1, max t = 604,800 sec/week. Try not to work at max t.

n = Number of people to perform these functions. The variable n is usually equal to 1: you. Happy is the sys admin when n > 1.

SAs have the opportunity to facilitate the progress and success of a project, company, or academic environment. In the same vein, SAs without proper education, training, and desire, can just as well hinder progress and harm their environment. With proper attention given to the variables discussed above, your employer can rest assured that their investment in you and their equipment is secured. You and your employer will benefit from the measures you have taken in these areas of SA. You do the math.

About the Author

Thom Garrett has been involved in various aspects of system/network installation and administration for 11 years. He received a B.S. in Mathematics/Computer Science from Virginia Commonwealth University. He is currently employed at Digital System Resources, Inc. located in Fairfax, Virginia, where he is Manager of Computer Services. He can be reached at tgarrett@dsrnet.co.