Cover V08, I06


Questions and Answers

Bjorn Satdeva

I received the following observations in regard to the discussion of MX records. Bruce Wang writes:

I read the Q & A in the April '99 issue of Sys Admin. One reader challenged your answer "By definition, any host that receives mail should have at least one MX record." The reader was "not inclined to add an extra 2200-some DNS records to 400 DNS zones we host. My experience indicates they are simply not necessary".

Well, I manage over 2,000 DNS zones, 70% of them are forward DNS domains. A lot of troubleshooting I do everyday is in response to clients who complain they are unable to receive/send mail. The problem always winds up being with no MX records set on their domains, or the MX records not set up properly (such as no A record associated with that MX record).

It is very interesting to know how the mail will be sent through a domain WITHOUT an MX record in that domain. If it is *internal DNS*, no MX record may be fine. /etc/hosts will take care of name and address. But if it is across networks, an MX record is absolutely required for a domain/zone.

However, it is not necessary to add an MX record to each host. Instead, just add it to the zone itself. The hosts within that zone will be able to receive/send the mail.

That reader does not have to add 2200-some DNS records to 400 zones he hosts. He just need to add 400 MX records (perhaps far less) to the zones. Relying on Sendmail to deliver mail to A record may work, but it is not a good practice, at least in my experience. It will cause a big headache when someone yells at you for not being able to receive the mail and you have no clue of what the problem is.

In short, I fully agree with your answer, except to replace the word "host" to "domain" or "DNS zone".

Another reader, Fredrik Jansson, writes about NGROUPS_MAX.

I read the Sys Admin January '99 issue and noticed a question about NGROUPS_MAX. We have tried to increase that, but some NFS-clients for Windows 3.x and NT don't support more than 16 groups. The solution was to use SMB instead of NFS in the matter of file sharing.

I have also received several letters stating that it is possible to copy tapes using the tcopy or dd command. While this is theoretically correct, I will not recommend it. When you are copying directly tape to tape, you are strictly copying bits, and if the source tape has bad spots, you might not know about it until it is too late. Yes, it is a little bothersome to have to first do a restore, and then another backup, but in my mind it's better safe than sorry.

 Q Suppose I login as a root on the console and only one terminal, with no networking connections. At any time, the system may hang. How can I release the system from hung condition? No terminal, modem, or remote terminal is able to log in.

 A If your system truly has no network capability, then there is nothing you can do when this happens. However, if your system is supporting networking, and what you really meant was that you have no other system on your network, then you can use a little trick: telnet to localhost, go through the login sequence and then proceed as normal. If your shell locks up on you, you can get out of your telnet process by sending it a escape signal (on most systems, a Cntl-]). This will take you back to the telnet process, which you can exit by typing Cntl-D. You will then be back to your original login shell, from where you can repeat the sequence. Of course, if the system itself locks up, then you are out of luck again, and will have to reboot it.

 Q Windows NT does not seem to recognize fully the UNIX permissions on files (i.e., rwxrwxrwx). I look after a large tree of spatial data on a Solaris UNIX platform, and NT users on the network can rename a UNIX directory - even including a space character! This can be done despite the permissions on the directory name. The renamed directory can then not be repaired using UNIX commands (as far as I know). I have never seen any reference to this problem - is there a solution or do we have to live with this worry?

 A If your pcnfs server allows this to happen, you apparently have a bug in your software. However, having spaces in a UNIX file name is legal, albeit somewhat inconvenient. To access such a file from UNIX, you can escape the space character with a backslash. In other words, if a NT or PC user has created a file named "this is a file", you can access this file from UNIX by typing "this\ is\ a\ file". The same is the case for any other meta character. As mentioned elsewhere in this column, the only special character you cannot escape is the slash ('/'), because it is hardcoded into the kernel as the separator between directory names in a pathname.

 Q I have a large tech library that is all hardcopy. I would like to be able to scan the documents in and store them in a CD library. I have access to a multitude of hardware/software and can purchase what I don't have, but am clueless as how to set the whole thing up. I would like my users to be able to access the CD library themselves (through NFS/NIS) and pull out what they need without my help. Suggestions as to a good setup and recommendations for quality software/hardware would be greatly appreciated.

 A What you are looking for is a good document management application. There are a number of commercial products available that can do this for you. As I never have used any of them to any significant extent, I do not want to recommend any of them.

 Q I am unable to unzip the *.tar.gz files. From which Web site can I download the gunzip and tar software?

 A In UNIX, files with the .gz extention are created by the GNU zip command. You can download the sources for that command by ftp from

 Q I have a proxy logs directory for which every Mon, Tue, Wed, Thur, and Fri the access log file rolls over from a file called "access" to a file called "access.ddmm-am". I would like to automate the process of removing these files after one week. Can you suggest a ksh script that would accomplish this?

 A You can accomplish this very simply with the find command (below). If you execute the following command in the directory in which your log files are located (or replace the '.' with the directory name), then your files will be removed when they are older than seven days:

find . -ctime +7 -print | xargs rm -f

 Q How do I kill other processes without logging into "SU" or "ROOT"?

 A You cannot do that. There is a very good reason for this. If users were able to kill processes owned by other users without the need to become root first, you would have a big problem. Even if everybody who has access to the machine is a well-behaved citizen, who would never dream of killing a process that does not belong to them, you would still see problems resulting from processes being killed by accident. Next time you need to become root to kill a process, give thanks that this is the case.

 Q I want to route NetWare IPX protocol using a SPARCStation 20 with Solaris 2.6. Is this possible?

 A I am afraid that you are out of luck. I have never heard anything to make me think a Sun machine can be used to route NetWare protocols. You will need a real router that can handle this protocol.

 Q How can I kill a hung process without shutdown? I have tried kill -9 pid, but the process is not killed. Please provide help.

 A It depends on why the process is hung. Theoretically, kill -9 is a "sure kill". However, if the process is hung in the kernel (e.g., waiting somewhere in a device driver for an event which never occurs), then it will not be able to discover it has been killed, and will therefore not go away. Likewise, defunct processes are only dead entries in the kernels process table and cannot be killed either.

 Q I couldn't access an Internet site and got this error "DNS entry on the server could not be found."

 A This means that there is no entry for that server name in the domain name service. DNS is the host name lookup service for all hosts across the Internet. When you get this (or a similar) message, either you have misspelled the server name or the host is no longer there. Of course, if your name service is not configured correctly, you may also get this kind of message because your name server does not know where to get the information it needs.

 Q How can I write a secure shell script that allows normal users to execute commands that require superuser privilege, such as to kill the jammed print queue/cancelled print job?

I have tried to do it like:

cancel lp1
disable lp1
enable lp1

Any ideas?

 A You cannot do it that way, because when you call su, you are creating a new process, which will become your superuser shell. One way to get around this is to use the expect command. The expect command has been created for situations where you want to execute interactive commands. However, in the case of su, you don't want to even use expect, because you will need to commit your root passwd to disk in clear text in order to get past the passwd prompt in su. Luckly, you have another option, as you can write a set UID program in Perl. Using Perl, it is relatively secure to write set UID root scripts. However, on most systems, you do not need to be root to execute lp-related commands, being the user "lp" should be sufficient.

 Q Do you know of a POP or IMAP email handler for Solaris? My users live and die by Eudora running on their desktop PCs and Macs and I need to migrate all of them from our MHA running on a DG to a Sun running Solaris 7. Any help or direction would be really appreciated.

 A UNIX servers for both POP and IMAP are easily available for Solaris and most other UNIX platforms. You can get popper from

 Q We recently had a problem with some bad memory, and as a result had some file system corruption. We were able to restore the root file system to a state prior to the problems; however, there is a badly formatted file residing in /lost+found. This file contains "/" slashes and " " spaces in the file names. Normally I can get rid of files if all they have is spaces or other special characters. However, I cannot seem to get rid of this file because it has the "/" character in its name. The rm command thinks that it's a directory. The rmdir command says it's not a valid directory. An ls -l with no options shows that it is a regular file with an extremely odd naming problem.

 A The slash ( '/' ) is the only character in UNIX that cannot be escaped in some manner. The reason is that it is hardcoded into the kernel as the character that is used as directory separator in path names. Occasionally, often from a hardware malfunction of some kind, you will find directories or files with weird characters in them. In most cases, you can work around them. One way is to bypass the shell (and the shell meta character) by writing a small Perl or C program, which calls the unlink system call directly. However, if the bad file name contains a slash, as in your case, the file system checker, fsck, is supposed to remove such files, because they are illegal in the UNIX environment. If fsck does not remove the file, try to clear the inode using the clri command, and immediately unmount the file system and run fsck again.

 Q My system is SUN SPARC 20, Solaris 2.5.1. But, the following SISC error has occurred.

WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/ \
esp@f,800000 (esp0): Connected command timeout for Target 2.0 WARNING: /iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/ \
esp@f,800000 (esp0): Target 2.0 reverting to async. mode WARNING: \
/iommu@f,e0000000/sbus@f,e0001000/espdma@f,400000/ \
esp@f,800000/sd@2,0 (sd2): SCSI transport failed: reason 'timeout': retrying command

 A You either have a faulty disk or you have a problem with your SCSI bus.

 Q I have a UNIX Server with 15 PCs running Hummingbird Telnet software connecting the UNIX server through Ethernet (ttyp0...ttyp32). Normally, we run some important database jobs and while running the jobs, no user should log in. Please advise how to prevent users from logging in when the server is processing those database jobs?

 A On some versions of UNIX, you can create the file /etc/nologin. Unfortunately, that will affect all logins, not just those coming over the network (I cannot determine from your question if this is desired). If logins cannot be prevented in this manner, or if you only want to prevent network logins, you can edit (or replace) the /etc/inetd.conf file, so inetd no longer will allow telnet, rlogin, rshell, etc. to the machine. Remember to send a interrupt signal to inetd each time you alter the /etc/inetd.conf file to make it reread that file.

 Q How do increase the inode in the running system?

 A For the UFS file system, the number of inodes are determined at the time you create the file system. This cannot be changed after the fact. If you are running out of inodes, your only choice is to backup the file system, remake the file system with a larger number of inodes, and then do a restore. n

About the Author

Bjorn Satdeva is the president of /sys/admin, inc., a consulting firm which specializes in large installation system administration. Bjorn is also co-founder and former president of Bay-LISA, a San Francisco Bay Area user's group for system administrators of large sites. Bjorn can be reached at