Cover V10, I10

Article

oct2001.tar

Questions and Answers

Amy Rich

Q I have an array in Perl with a large number of duplicate entries. How do I remove the duplicates?

A This is actually answered in the perldoc information. Run the following on a system that has Perl installed:

perldoc -q duplicate
And you get the following output:

    How can I remove duplicate elements from a list or array?
    There are several possible ways, depending on whether the array is ordered and whether
    you wish to preserve the ordering. (this assumes all true values in the array)

    a) If @in is sorted, and you want @out to be sorted:
    $prev = 'nonesuch';
    @out = grep($_ ne $prev && ($prev = $_), @in);

    This is nice in that it doesn't use much extra memory, simulating
    uniq(1)'s behavior of removing only adjacent duplicates. It's less
    nice in that it won't work with false values like undef, 0, or "";
    "0 but true" is OK, though.

    b) If you don't know whether @in is sorted:
    undef %saw;
    @out = grep(!$saw{$_}++, @in);

    c) Like (b), but @in contains only small integers:
    @out = grep(!$saw[$_]++, @in);

    d) A way to do (b) without any loops or greps:
    undef %saw;
    @saw{@in} = ();
    @out = sort keys %saw; # remove sort if undesired
    integers:

    e) Like (d), but @in contains only small positive
    undef @ary;
    @ary[@in] = @in;
    @out = grep {defined} @ary;

    But perhaps you should have been using a hash all along, eh?

Q I'm from a Sun/Solaris background, but my new position is at an HP shop. Is there HP hardware comparable to the Starfire?

A HP's top of the line machine is the Superdome. In a single node it can have up to 64-way 550 MHz, 4-way PA-8600 CPUs with 1.5 MB on-chip cache per CPU. It has hardware partitioning capabilities, so you can run up to 16 different instances of the OS. They're currently supporting only HP-UX, but I think they also plan to support NT and Linux when they come out with Itanium chips for this machine. Each node holds up to 128 G of memory. For storage options it can do JBOD, fibre, or HP SureStore disk arrays.

For more info, visit HP's Superdome page:

http://www.hp.com/products1/unixservers/highend/superdome
Q I'm running an E250 with Solaris 8. I'm currently using top to look at the load on the machine and see how loaded the CPUs are, but I'd like something that shows the two CPUs more distinctly. Is there something else available that does a better job?

A Although it's not in a nice graphical format like top, try mpstat (similar style output to vmstat et al.). This will individually list each CPU and report the following:

 

CPU processor ID
minf minor faults
mjf major faults
xcal inter-processor cross-calls
intr interrupts
ithr interrupts as threads (not counting clock interrupt)
csw context switches
icsw involuntary context switches
migr thread migrations (to another processor)
smtx spins on mutexes (lock not acquired on first try)
srw spins on readers/writer locks (lock not acquired on first try)
syscl system calls
usr percent user time
sys percent system time
wt percent wait time


Q I've looked at the man page for the RCS command co (check out), but I do not see an option to specify the location where the checked out file is placed. Do you need to write a wrapper to specify the check out location or is there a way to pass a directory name to co?

A co lets you specify stdout as the destination of the checked out file, so you could do the following:

co -l -p testfile > /tmp/testfile
Be sure to copy the file back to its proper directory before checking the file back in.

Q I accidentally did an rm -rf foo * when I meant to do rm -rf foo* but realized I did the wrong thing almost immediately and did a CTRL-C. I want to restore files from a backup, so what I need to know is does GNU rm delete in alphabetical order, or by ctime/mtime, or inode, or something else?

A Using the "*" character for expansion is called globbing. How files are globbed is interpreted by your shell and not by rm. The rm command never actually sees the "*" since it's already been expanded by the shell. Your best bet is to do an ls (make sure you don't have ls aliased) on that same machine as the same user and see how your globbing is done.

Q When I try to install an RPM, I get the following message:

only packages with major numbers <= 3 are supported by this ...
What does this error message mean, and how can I fix it?

A This message is telling you that your version of RPM is too old. You have to upgrade RPM before you'll be able to install this particular package.

Q I'm trying to delve further into system security and I was wondering if you could explain the denial of service attack called a SYN flood?

A A SYN flood is when a large number of bogus TCP connections are initiated, but not actually established, filling up the TCP connection buffer.

To understand how a SYN flood works, you need to know something about TCP. TCP is a connection-oriented transport layer service in which packets are guaranteed to arrive in order. When two hosts want to establish a TCP connection, they use three-way handshaking, determining the sequence numbers that the connection will use.

Host A will first send a packet with the SYN flag set to host B to indicate that it wants to establish a connection. This packet will contain the ISN (Initial Sequence Number) that host A intends to use. Host B will send back a SYN/ACK and its ISN, indicating that it received the request (ACK) and is ready to receive data (the SYN and ISN). Host A then sends back an ACK indicating that it is ready to send data. The two hosts now send packets back and forth in sequence based on the establish chain of sequence numbers.

A SYN flood attack occurs when the attacker at host A sends only SYN packets and does not accept the return SYN/ACK packet from host B. Because of limits set in the kernel, host B can only have so many half-open connections at any one time. When this limit is reached, no new TCP connections may be opened. Host A merely sends enough packets with the SYN bit set to reach this limit. The connection attempts will eventually time out on host B, but a SYN flood from host A sends too many SYN packets too rapidly for host B to keep up.

If you're getting TCP connection requests from just one host, you can block that host at the router. When under a distributed denial of service (DDoS) attack where multiple machines are sending SYN packets or one machine is sending SYN packets with forged IP headers, the general workaround to protect host B is to increase the kernel limit for initiating TCP connections and decrease the timeout for clearing aborted connection attempts. In conjunction, host B may also run a program looking for these half-open connections that then sends an RST packet to clear them.

With the popularity of DDoS SYN floods, programmers are redesigning their OSs to be more robust. If you think you may be under a SYN flood attack, though, you can use netstat to check the number of open connections that have received a SYN but not an ACK. There are also tools such as SYNWatch (http://www.rootshell.com) that sniff the network for SYN packets.

Q What are the benefits and drawbacks of using a Web proxy/cache like squid?

A A squid server caches static objects, so when multiple people look at the same static content, it can be pulled from the local cache instead of going out to the Internet to retrieve the information again. This results in reduced bandwidth usage and quicker apparent load times for the end user. Squid will also cache DNS lookups to help speed up retrievals from remote machines. Additionally, squid can be configured to read only from its cache in the case where the Internet connection is lost (offline mode).

Additionally, you can tunnel SSL/TLS connections with the CONNECT request method with squid. Squid doesn't actually understand or interpret the contents; it just passes bits back and forth. Squid also supports filtering so that clients are blocked from accessing "restricted" sites. This is useful in a corporate environment where policy states that certain sites are not to be viewed from corporate machines. Another security feature is that you only have one machine as a point of contact for outgoing requests instead of allowing everyone's desktop direct access to Web servers.

In general, the same things that can make squid a win can also make it a problem. For instance, data cached with squid may be out of date with the real data on the remote server. Squid also only caches static data, so there's no gain on pages that use dynamic data. As well as being a single attack point for security purposes, your squid caches are also single (or a few) point of failure. If your squid machine goes down, but the rest of your network is up, you still won't be able to access the Web. Some sites that are blocked by the squid configuration may actually be ones required for work by some people. Or, the people being blocked may try to find ways around the proxy that will defeat the pluses it offers. And, finally, because squid intercepts SSL connections, there is always a chance that someone could develop and insert a program to send the sensitive data somewhere other than its intended destination.

Q I'm looking for a script that will search the logs for certain regexps through a given date range. I could write my own but would rather find something already done. Any thoughts?

A This is really a fairly trivial thing to write on your own from the command line. You would first use grep or egrep for the desired date range and then pass that to egrep for whatever regular expressions you want to search for. For example, if you wanted to search for all occurrences of Bob and Fred in the mail log from October 1st through 9th, you'd do the following:

grep "Oct  [1-9]" /var/log/maillog | egrep "(Bob|Fred)"
You could also use egrep to look for those same dates in November at the same time:

egrep "(Oct|Nov)  [1-9]" /var/log/maillog | egrep "(Bob|Fred)"
If you're going to be doing this sort of monitoring frequently, you may want to look at swatch (ftp://ftp.stanford.edu/general/ \ security-tools/swatch/) to check your log files on a real-time basis, instead.

Swatch allows you to specify things to look for in various log files. When swatch matches a pattern in the configuration file, you can have it do various things like email you, page you, or run a program. This can be extremely useful for things like watching mail logs for denied relaying attempts or for watching syslog for disk errors.

Q I'm using ufsdump on Solaris 8 to back up my data partition. I've successfully run a level 0 dump and noted that it took up about 20 G of space on the tape. Now when I run a level 1 dump, it's dumping almost the whole disk again, and most of this data hasn't changed in months! I thought that level 1 should only dump the things that changed between now and the last level 0 dump? I've done a find from the data partition to print out all of the files with mtime that has changed in the past couple weeks (the last time I did a level 0 dump), and it only shows about a gig worth of files, certainly not 19-20 G worth. Why is ufsdump backing up things that haven't changed?

A ufsdump uses the ctime of the file, not the mtime. The inode could have been updated without the actual data in the file having been changed. If you run find looking for ctimes that have changed, I'm guessing you'll find that most of your data has been updated. Have you run anything like chown, chmod, chgrp, touch, mv, or cp on many of your files since you last did a level 0 dump?

Q I've recently moved my majordomo setup from one Solaris 8 box to another, and everything seems to be working fine sending mail to the lists, but when people try to unsubscribe, I get the following error in the log file:

{user@myaddress.com} ABORT chown(110, 104, \
  "/usr/local/majordomo/lists/testlist.new"): Not owner
110 is the UID of majordomo, and 104 is the GID. Both /etc/passwd and /etc/group have the right information.

I checked the majordomo FAQ, and it said that chown errors are normally caused by the wrong permissions on the wrapper binary, but it looks okay to me:

-rws--x--x   1 root     majordom    27112 Sep 19 20:17 wrapper
After the unsubscribe command fails, I'm left with an extra file in the lists directory:

-rw-r--r--   1 majordom web         19 Sep 19 20:17 testlist.new
Why would the .new file be group owned by Web (GID 103)? I suspect this has something to do with the problem, but I can't figure out where it's getting the Web group from. Any clues?

A On the machine that you copied majordomo over from, was the GID of majordom 103? When you compile the wrapper, it hard codes the UID and the GID that majordomo should run as. My guess is that the UID stayed the same, but the GID on the new machine differs. You can check this by running the wrapper config test as a normal (non-root, non-majordom) user:

./wrapper config-test
Your effective user should be majordom (uid 111), and your effective group should be majordom (gid 104). If the GID is listed as something different, then you need to recompile your wrapper with the correct values. If you still have the source sitting around (assuming that everything stayed in the same relative directories), you can just remove the wrapper binary and do:

make wrapper
Then copy the new wrapper binary into place and make sure it has the correct ownerships and permissions (as you listed above). If you don't have the source anymore, or the location of things has changed during the move, you'll need to get the source and change the Makefile to reflect your current setup.

Q I have a user running Linux at home and picking up mail at his ISP via POP3. His home account is prone to getting quite a bit of spam (he uses it to post on various newsgroups), and he wants to know if there's a way to filter the spam out before he downloads it. Any suggestions?

A If his ISP supports IMAP, I'd suggest switching to that since you can download only headers with IMAP and delete messages without downloading the body. If POP is your only option, though, then you can use fetchmail in conjunction with something like popsneaker:

http://www.ixtools.de/popsneaker/
or mailfilter:

http://mailfilter.sourceforge.net/
If you were going to use popsneaker, your fetchmailrc would look similar to the following:

poll pop.yourisp.com with proto POP3 user joe there with password \
  joespass is localjoe here options forcecr warnings 3600 \
  preconnect "/usr/local/bin/popsneaker"
  
Q I need to upgrade the OBP on my Ultra 2 and I heard there was some magic to doing this. Do you have any pointers for me that would describe the process?

A The main difference with the Ultra 1 and 2 boxes is that there's a jumper on the motherboard that you have to physically move to enable writing to the flash PROM to upgrade the OBP. Look at:

http://docs.sun.com/ab2/coll.28.20/SPARCHW/@Ab2PageView/ \
  idmatch(Z400066523F1)
for detailed instructions on how to upgrade the Ultra 1, Ultra 2, and Enterprise 450 machines.

Q I am migrating my IBM desktop to Linux. This will be the third attempt in a year. One problem I remember encountering is that my modem is not supported by Red Hat. It is a Lucent Technologies Winmodem. Is there any place I can find drivers for it?

A Winmodems don't have their own code in firmware and instead use the computer to which they are attached to do all of the low-level interaction for them. It used to be that only Windows machines had drivers for Winmodems, but Linux now has drivers for select chipsets. More information is available at:

http://www.linmodems.org/
Amy Rich, president of the Boston-based Oceanwave Consulting, Inc. (http://www.oceanwave.com), has been a UNIX systems administrator for more than five years. She received a BSCS at Worcester Polytechnic Institute, and can be reached at: arr@oceanwave.com.