Questions and Answers
The frustrations are running quite high here at /sys/admin this month. We are getting an new Internet connection, and the phone company and our Internet service provider have managed, between the various responsibilities, to get absolutely nothing working yet, although the line was supposed to be up several weeks ago. It is not so much the delays in getting the lines installed, debugged, and configured correctly, that is causing the frustration, as the complete lack of responsiveness from both the phone company and the ISP. The phone company, in particular, has a knack of not getting things done at the promised time, not getting them done right the first time, and not ever returning phone calls, but our ISP isn't much better at returning calls either. It seems to be a given for both organizations, that if we do not continuously call and bug their people, no work will be done on correcting the problem of the day.
You might wonder why I bother to bring this issue up here, but aside from the fact that bad service from carriers and ISPs seems to be a universal problem, there is a lesson for most system administrators here. Max Vasiliatos, who together with Rob Kolstad created the first LISA conference back in '87, was the first person I know to report that users who receive perfect technical service are generally not as satisfied as those who receive more mediocre service, as long as the system administrators kept a good contact with the users. Although it is a given that we inform our users of changes in service, such as system reconfiguration or planned down time (this is a given, isn't it?) the findings by Max Vasiliatos and others show that taking the time to meet with users in an unscheduled manner (e.g., by going around and chatting for a few minutes) can have a great influence on the perception of the services offered. This is an important lesson, not only for phone companies and Internet service providers, but for every system administrator.
The '97 SANS conference is is happening April 20-26, 1997 in Baltimore Inner Harbor.
Most people come to conferences hoping to find solutions to specific problems, so the SANS people have designed this conference to fulfill this expectation as much as possible. If you go to this conference (or any other system administration related conference, for the matter) and come away with just one or two answers to your problems, it will be well worth the trip.
Besides the main conference, a large number of tutorials are offered. Together with the LISA conference, this is the only place you can find courses taught by the people who deal with the problems every day. Most other tutorials are taught by people who teach for a living, and therefore may not have the same deep understanding of the issue they teach. For further information, send email to email@example.com.
And now let's go to this month's questions:
I am the admin for a small group of UNIX machines with a single server and 3 workstations. NFS/NIS is used to automount the servers /home/users/* to each of the workstations. The server is also an NIS master.
Under the unified login scheme, the workstations have to go down whenever the the sun is down, and this is rather painful for users of one particular application that we developed in house, because certain calculations can take weeks to complete. Unfortunately, this application has no efficient way of saving intermediate steps as it works, and it keeps all of its data in memory.
This application is currently ported to virtually every system that has an ANSI C compiler, but it does very little in the way of signal handling at present, and no networking at all, which has minimized our difficulty in porting it.
I was wondering if there was any generic way for an admin to "park" the state of a UNIX process, and later "resume" it without the processes cooperation. Possibly some scheme involving a core dump?
You may be able to have the program dump core, and then resume the execution later using a program such as the GNU debugger (gdb). However, this would at best be a temporary hack, because gdb is written and traditionally used as a debugging tool. To my knowledge, no UNIX has the support for stopping a process and restarting it later (the shell job control does not count, as the process stays in memory). Other long running programs (such as simulations of a new CPU) that I have encountered all write checkpoints containing their state, and allow for a restart at the last checkpoint. I think that your only long-term solution is to add such checkpoint facility, even if it has to be done in a operating system-dependent manner.
I work for the the Department of the Navy, and we are moving out of specialized military computers into commercial UNIX workstations. In fact there are surface ships out there today with UNIX machines in the Combat Informations Center. One of the problems we face is whether we should give full root access or limited root access to the ship's personnel. One suggestion is to give them limited root privileges along with a GUI that is specifically made for sys admin tasks. So, my question has two parts:
a. Do any sys admin GUIs exist, and where can I get information, ftp sites, etc. about them?
b. What are the various tools for giving limited root privileges besides su, super, and op, and which do you recommend?
I have no personal experience with military applications of UNIX, but I can imagine how difficult your dilemma must be. As in any situation where dependable operation of the system is a requirement, the users cannot be allowed to have root, but on the other hand, if the system crashes four days out of port, getting help may take a while.
During my time with UNIX, I have seen many sys admin GUIs, but have yet to see one that really works. Most of the GUIs I have seen have been simply fronted to the standard UNIX system administration tools, which added very little in improved user friendliness, and removed much of the flexibility a system administrator would normally have. Another problem is that most system administration GUIs come with a lot of hard-coded policies, making them inappropriate for a wide audience. A third problem is that most system administration GUIs are at a loss when the system is not operating correctly. Some people have claimed that an expert-based application can be built that can eliminate these problems while providing an GUI that is both user friendly and effective. Frankly, after having seen many such products, both commercial and freeware, I no longer believe that this is possible. To illustrate this, consider a program such as netstat. Although its output is cryptic, if you understand how the network operates, you can make sense of the netstat output without too much trouble. On the other hand, how do you translate its output in an understandable manner to somebody who has no knowledge of TCP/IP networks, without first teaching them basic understanding of the network?
To get back to your question, you should be able to write a GUI that simplifies certain repeatable tasks, such as adding users or performing a backup, as long as you have a very clear idea of how these tasks must be performed in your environment. That means you will have to write your own GUI. The graphical interface part is relatively easy these day, as you can use Tcl/Tk, or if security is of no concern, HTML and a web browser. The biggest amount of work, despite common belief, is completing the procedural part correctly.
In answer to the second part of your question, my personal choice is op. Another program, which you did not mention, is Evi Nemeth's sudo, which continues to have functionality added, and by now should have a functionality comparable to op. Both programs are available from the system administration ftp archive (although op and a few other programs otherwise unavailable from the Net were gone for a while due to a mirror misconfiguration).
I am relatively new to the Unix system administrator's role, but I have many years experience in the large IBM mainframe world. Recently, I went surfing for some tools to produce performance information of the kind we take for granted in MVS. I found some interesting packages, but they have funny file formats such as: cbw.tar.gzip. What do I have? I understand many formats, but have never seen this one.
This is a common UNIX and Internet-wide naming scheme, which tells that it is a tar archive of a program called cbw (probably the Code Breakers Workbench) that has been compressed with the GNU encryption program called gzip. You can unpack this file by decrypting and then un-tar the file, or do it in a single command:
gzip -d < cbw.tar.gzip | tar -xvf -
Similar, a file ending in a Z is compressed by the older compress program, and can be unpacked by the command:
compress -d < cbw.tar.Z | tar -xvf -
The last "-" of the tar command makes tar read from standard input.
It is unlikely that gzip will be on your system as shipped from the vendor. You will need to ftp it from the archives and compile it before you can unpack your file.
I'm playing around with the syslog daemon at the moment, but I can't decide which is the best configuration to use. Do you have any pointers as to what is a good solution?
The syslog daemon is today probably the most common UNIX method for logging information (although an annoying number of programs are still logging directly to a file). Syslog has two big advantages for the system administrator who wants to control and direct how the UNIX system is doing logging.
One advantage is that logging is done across the network, so it is possible to have all machines on a network log to a single host, getting one single depository for all logs. The caveat is that syslogd is using UDP for its connections, making it impossible for the client to know if the messages actually are received on the server host, Another, somewhat annoying, problem is that all remote hosts must connect directly to the log server. If a remote host A is logging to host B, which in turn is logging everything to the log server C, then all messages from A will be forwarded to C, but will appear to originate from B.
Another nice thing is that syslog uses a number different facilities that can be described as queues of logs, and within each facility there are a number of priorities. In the syslog configuration file (typically /etc/syslog.conf), the sys admin can then decide which messages should be logged and to what file. One thing to be aware of, is when you choose a certain logging level, syslog will log events at that level as well as any event at any higher level. In other words, if you choose to log a facility at the debug level (which is the lowest level), you will get not only debug information, but any other log information at any level for that facility. Depending on the facility and the activity on your hosts, this can be more information than you bargained for.
I have recently been through an audit in which I was hit about not having any formal configuration control procedures. I am now tasked with putting some procedures together and was wondering if there were some I could get my hands on, so I wouldn't have to reinvent the wheel. If you have any ideas or comments, I would appreciate them.
I know of no formal scheme for this. If anybody has a solution to this problem, commercially or otherwise, which works and is used at their site, I would like to hear about it. I am using a more ad hoc approach, where all interesting files are routinely and automatically checked in under RCS. While this scheme does not record who did what, it will record any change done to a system within the one hour granularity which we use.
I am a UNIX admin managing a large number of SCO and Solaris operating systems. I am wondering if you know of any software or freeware out there that does a clean and safe job managing user accounts from a single mechanism (other than NIS or DCE). We currently use a shell script and group file to handle this, but find this method somewhat unreliable at times.
I think that user account administration is probably one of the most policy-intensive applications anywhere in UNIX system administration. There is a number of freeware packages, but they all have a very strong flavor of their authors' local site policies included (typically that of a university), and they are all dated by now (for example no shadow file support). In spite of the many things that are wrong with NIS, it is probably still the best possible generalized solution. If you are in a situation where you can force users to log into a single machine to change their password, you can use an rdist-based solution. Dependent on your environment, this can be an improvement over NIS or NIS+, but you are using the old, unpatched version of rdist, you are much worse off from a security perspective. It also takes careful planning and implementation to make an rdist-based solution scale well in a very large environment.
In my opinion, there is a great need for a policy free, large-scale, user account administration software package. If anyone knows of such a package, I would like to hear about it.
I appreciated your comments on dump and tar in Sys Admin. I was wondering if you can shed light on something else related to tar for me. You are correct that most versions of tar will puke on a job with files that have holes. I know that most versions of tar (at least in the past) would also choke on any file whose absolute path was longer than 100 characters. Is this still a problem with most tar programs? I tend to avoid tar as much as possible - but seeing as how sendmail has continued to be buggy for years and years, I really doubt if anyone has made the effort to fix tar.
I think that most versions of tar probably have been fixed by now (i.e., the path name length is longer). This change had to come with the move away from the old 14-character filename limitation. The best way to test it is to run the torture package that I have mentioned here a few times:
As far as the comment about sendmail, I understand your frustration, but must take issue with it. Most importantly, sendmail is running as a set uid root program. This makes it vulnerable and a prime target for black hats. Additionally, it is solving a tough problem, because delivering mail requires many completely different tasks to be performed. The right thing, based on what we have learned from designing firewalls, would be to break it into a number of smaller and more specialized programs. However, this is not likely to happen, because sendmail is working just well enough to prevent other mail delivery agents from being implemented. n
About the Author
Bjorn Satdeva is the president of /sys/admin, inc., a consulting firm which specializes in large installation system administration. Bjorn is also co-founder and former president of Bay-LISA, a San Francisco Bay Area user's group for system administrators of large sites. Bjorn can be contacted at /sys/admin, inc., 2787 Moorpark Ave., San Jose, CA 95128; electronically at firstname.lastname@example.org; or by phone at (408) 241-3111.