Article

nov2002.tar

Questions and Answers

Amy Rich

Q I found the following line in /etc/system on my Sun E3500:

set snooping=1

I'm not really sure what this is for, but it looks pretty suspicious. Does this mean that I've been hacked and that someone is snooping everything that goes out of and comes into the machine?

A The snooping parameter in /etc/system is not for snooping traffic flowing through the machine. Setting snooping to true (1) turns on the deadman timer and is used to try to force a drop to the ok prompt when the system hangs hard. At that point, you generally do a sync to get a core dump for someone to dissect. This parameter is probably set in /etc/system because of some issue with the system hanging without crashing sometime in the past. This is a common request made by Sun support to help in debugging such situations. I hope you're doing some sort of checksumming on your system files, so you can verify when this change was made if you do not remember making it yourself.

Q I'm setting up several servers in various parts of the country for one client. Each server will be hosting its own domain, including Web, email, ftp, etc. I was wondering how best to configure DNS for these machines. Should one machine be master and all the rest slaves, or should each machine be master for its own domain and slaves for the others?

A Depending on the number of machines, there are tradeoffs with whichever method you choose. How many machines are there (4, 10, 50+)? For a large number of machines, making each the master for its own domain can be an administrative nightmare -- especially if all of your IT people are centralized in one place and changes must be made to each machine individually. It also makes it harder to keep track of the changes made if you're using any sort of revision control. On the other hand, if there's one master machine and the rest are slaves, then you run the risk of the master being down when the DNS information for one of the slaves needs to be updated immediately.

There's also a point of diminishing returns when you keep adding slave servers. If any one of your authoritative hosts is corrupted, there's a good chance that that server will be queried and you'll lose traffic. It's better to have a few really reliable servers than many semi-reliable servers.

If you're setting up a medium number of machines, you could keep a centralized master zone file directory on one machine and then use some automated job to push the zone files out to all other machines. Each machine would only load its own master zone file when starting named and just ignore the rest. This would allow you to centralize the administration and still have each machine be its own master. It would also allow you to have the master zone files on all of your machines in case you were to lose one machine out of the bunch. It would be fairly trivial to modify one of the remaining machines to be the new master for the domain of the downed machine. Depending on your environment, though, this may be more of a headache than it's worth.

Q I'm trying to compile an application from a third-party vendor, but I'm getting errors with gcc. The compile keeps complaining about having unrecognized options such as -Xc and -Kpic. Am I missing a library or do I have a corrupt gcc installation?

A These are options to Sun's C compiler, not gcc.

-Xc -- Maximally conforming ANSI/ISO C, without K&R C compatibility extensions. The compiler issues errors and warnings for programs that use non-ANSI/ISO C constructs.

-Kpic -- Produces position-independent code for use in shared libraries. Each reference to a global datum is generated as a de-reference of a pointer in the global offset table. Each function call is generated in pc-relative addressing mode through a procedure linkage table. In later versions of the Sun compiler, this is equivalent to -xcode=pic13 (-KPIC is equivalent to -xcode=pic32).

I'm guessing that you're using a makefile that was generated by imake for use with Sun's C. If so, you can fix your Openwindows configuration to use gcc instead of cc. Edit /usr/openwin/lib/config/site.def and uncomment these sections by removing the surrounding /* */ characters:

#ifndef HasGcc2
#define HasGcc2 YES
#endif

#ifndef HasCplusplus
#define HasCplusplus YES
#endif

Also uncomment and change #define PreIncDir to be the location of your gcc include files. For a Solaris 8 machine running 2.95.3, it should look something like:

#define PreIncDir /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/include/

After making these changes to /usr/openwin/lib/config/site.def, also edit /usr/openwin/lib/config/sun.cf and comment out (using /* and */) the HasSunC and CCompilerMajorVersion directives:

/*
#define HasSunC YES
#define CCompilerMajorVersion 4
*/

And define HasSunC as NO:

#define HasSunC NO

Finally, add the following lines to the top of /usr/openwin/lib/config/Imake.tmpl:

#define LdPreLib -L$(BUILDLIBDIR) -R$(BUILDLIBDIR)
#define LdPreLib -L$(USRLIBDIR) -R$(USRLIBDIR)
#define LdPostLib -L$(USRLIBDIR) -R$(USRLIBDIR)

You can now use xmkmf to generate the proper makefile for use with gcc.

Q I've just upgraded to bind 9.2.1 on my FreeBSD 4.6-STABLE machine, and I'm trying to use rndc. I edit my zone file, change the serial number, and issue an rndc reload command. Even though named is running, I receive a connection refused error. It works fine if I just HUP the daemon instead of using rndc, however. Is there some new syntax that I should be using that I can't find in the docs?

A Check for the following line in /etc/rndc.conf:

default-server localhost

If you have the name localhost there instead of the IP 127.0.0.1, you may be having issues because the /etc/hosts entry or the DNS entry for localhost maps to both the IPv4 (127.0.0.1) and IPv6 (::1) loopback addresses. If you don't have IPv6 fully operational, this can cause "network unreachable" or "connection refused" error messages. To make sure that bind does not try to connect to the IPv6 loopback interface, replace the above configuration line with:

default-server 127.0.0.1

Q I'm running Solaris 8, and we have a series of shell scripts that take a very long time to run. Basically, a large file of file names is read in and an action is performed on each file listed. I don't want to run the shell scripts with -x because I don't want to slow them down any more, and I don't need all of the output, but I do occasionally want to check and see where the program is in its execution. Is there any way to occasionally take a peek at the progress of the shell script (how far it's gotten through the list of file names) after it's already running?

A Since your data is pretty unique, it should be fairly trivial to tell where you are in your data file by attaching truss to the PID of the running shell script:

truss -p <PID>

From the truss man page:

-p Interprets the command arguments to truss as a list of process-ids for existing processes (see ps(1)) rather than as a command to be executed. truss takes control of each process and begins tracing it provided that the userid and groupid of the process match those of the user or that the user is a privileged user. Processes may also be specified by their names in the /proc directory, for example, /proc/12345.

Q One of our FreeBSD 4.6-STABLE machines recently crashed hard. The root filesystem and a concatenated vinum RAID volume were damaged. I was able to fsck the root partition and comment out the vinum partition and get the machine back up. I can't seem to get the vinum partition back online, though. When I try to fsck the partition, it tells me that the superblock is unreadable. Is there any hope of getting my data back?

Here's the output of vinum dumpconfig:

Drive d4:       Device /dev/ad1s1e
Created on lets.impeachbush.org at Sun Oct 14 11:13:53 2001
Config last updated Tue Aug  6 05:57:12 2002
Size:      81956657664 bytes (78159 MB)
volume incompetence state up
plex name incompetence.p0 state corrupt org striped 512s vol incompetence
sd name incompetence.p0.s0 drive d4 plex incompetence.p0 len 160069632s \
  driveoffset 265s state crashed plexoffset 0s
sd name incompetence.p0.s1 drive d5 plex incompetence.p0 len 160069632s \
  driveoffset 265s state up plexoffset 512s
sd name incompetence.p0.s2 drive d6 plex incompetence.p0 len 160069632s \
  driveoffset 265s state up plexoffset 1024s

Drive /dev/ad1s1e: 76 GB (81956657664 bytes)
Can't get label from /dev/ad1s2c: Invalid argument (22)
Can't get label from /dev/ad1s3c: Invalid argument (22)
Drive d5:       Device /dev/ad2s1e
Created on lets.impeachbush.org at Sun Oct 14 11:13:53 2001
Config last updated Tue Aug  6 05:57:12 2002
Size:      81956657664 bytes (78159 MB)
volume incompetence state up
plex name incompetence.p0 state corrupt org striped 512s vol incompetence
sd name incompetence.p0.s0 drive d4 plex incompetence.p0 len 160069632s \
  driveoffset 265s state crashed plexoffset 0s
sd name incompetence.p0.s1 drive d5 plex incompetence.p0 len \
  160069632s driveoffset 265s state up plexoffset 512s
sd name incompetence.p0.s2 drive d6 plex incompetence.p0 len \
  160069632s driveoffset 265s state up plexoffset 1024s

Drive /dev/ad2s1e: 76 GB (81956657664 bytes)
Drive d6:       Device /dev/ad3s1e
Created on lets.impeachbush.org at Sun Oct 14 11:13:53 2001
Config last updated Tue Aug  6 05:57:12 2002
Size:      81956657664 bytes (78159 MB)
volume incompetence state up
plex name incompetence.p0 state corrupt org striped 512s vol incompetence
sd name incompetence.p0.s0 drive d4 plex incompetence.p0 len \
  160069632s driveoffset 265s state crashed plexoffset 0s
sd name incompetence.p0.s1 drive d5 plex incompetence.p0 len \
  160069632s driveoffset 265s state up plexoffset 512s
sd name incompetence.p0.s2 drive d6 plex incompetence.p0 len \
  160069632s driveoffset 265s state up plexoffset 1024s

Drive /dev/ad3s1e: 76 GB (81956657664 bytes)

And the output of vinum l:

3 drives:
D d4             State: up    Device /dev/ad1s1e   Avail: 0/78159 MB (0%)
D d5             State: up    Device /dev/ad2s1e   Avail: 0/78159 MB (0%)
D d6             State: up    Device /dev/ad3s1e   Avail: 0/78159 MB (0%)

1 volumes:
V incompetence          State: up       Plexes:       1 Size:     228 GB

1 plexes:
P incompetence.p0     S State: corrupt  Subdisks:     3 Size:     228 GB

3 subdisks:
S incompetence.p0.s0    State: crashed  PO:        0  B Size:      76 GB
S incompetence.p0.s1    State: up       PO:      256 kB Size:      76 GB
S incompetence.p0.s2    State: up       PO:      512 kB Size:      76 GB

A Since one of your subdisks is listed as crashed and your plex is just a concatenation (not a RAID 5), you should consider your data damaged and restore from backup. If you want to try and salvage what's on the disk, you can try the following:

Boot up the machine and have vinum start, but don't try to mount the damaged partition (I think you've already gotten this far). Then turn off logging for that plex:

vinum setdaemon 4

Set the states of the damaged subdisk and plex to "up":

/sbin/vinum setstate up incompetence.p0.s0
/sbin/vinum setstate up incompetence.p0

You can now try to fsck the raw device and see whether there are any errors:

/sbin/fsck /dev/vinum/incompetence

If the disk appears clean, turn logging back on, and do an explicit save of the configuration:

/sbin/vinum setdaemon 1
/sbin/vinum saveconfig

At this point, you can try to reboot and see whether the subdisk and plex are still listed as up when doing a:

vinum l

If so, uncomment your partition and try another reboot. If you do manage to get the machine back up, I'd obviously suggest doing a backup right away.

Q I've installed a number of Solaris x86 servers for our external Web cluster. Now we need another package off the Software 1 of 2 CD-ROM, but I can't seem to get it to mount on any of the servers. I can get any other HSFS CD-ROM to mount just fine, and I can also boot off the problem CD-ROM ok, so I don't think it's a hardware issue. Is there a trick to getting the packages off the CD-ROM after you've already done an install? I've never had this issue with the SPARC versions of the operating system.

A Ideally, you'll let vold handle mounting the disk for you. If you haven't installed vold for security reasons (you did say these were publicly accessible servers), then you can still mount the CD-ROM by hand. There's a legacy layout design on the CD-ROM that makes it a bit different from normal HSFS CD-ROMs that you try to mount. Instead of being able to mount one of the slices directly, you have to mount it with the following command:

mount -r -F hsfs /dev/dsk/c0t6d0p0 /mnt

The c0t6d0p0 device number used in the above command assumes your CD-ROM drive is set to SCSI target 6. If you get the response "no such device," your CD-ROM probably uses a different SCSI target number.

Amy Rich, president of the Boston-based Oceanwave Consulting, Inc. (http://www.oceanwave.com), has been a UNIX systems administrator for more than five years. She received a BSCS at Worcester Polytechnic Institute, and can be reached at: qna@oceanwave.com.