Questions
and Answers
Amy Rich
Q I have an array in Perl with a
large number of duplicate entries. How do I remove the duplicates?
A This is actually answered in
the perldoc information. Run the following on a system that has
Perl installed:
perldoc -q duplicate
And you get the following output:
How can I remove duplicate elements from a list or array?
There are several possible ways, depending on whether the array
is ordered and whether
you wish to preserve the ordering. (this assumes all true values
in the array)
a) If @in is sorted, and you want @out to be sorted:
$prev = 'nonesuch';
@out = grep($_ ne $prev && ($prev = $_), @in);
This is nice in that it doesn't use much extra memory, simulating
uniq(1)'s behavior of removing only adjacent duplicates. It's
less
nice in that it won't work with false values like undef, 0,
or "";
"0 but true" is OK, though.
b) If you don't know whether @in is sorted:
undef %saw;
@out = grep(!$saw{$_}++, @in);
c) Like (b), but @in contains only small integers:
@out = grep(!$saw[$_]++, @in);
d) A way to do (b) without any loops or greps:
undef %saw;
@saw{@in} = ();
@out = sort keys %saw; # remove sort if undesired
integers:
e) Like (d), but @in contains only small positive
undef @ary;
@ary[@in] = @in;
@out = grep {defined} @ary;
But perhaps you should have been using a hash all along, eh?
Q I'm from a Sun/Solaris background,
but my new position is at an HP shop. Is there HP hardware comparable
to the Starfire?
A HP's top of the line machine
is the Superdome. In a single node it can have up to 64-way 550
MHz, 4-way PA-8600 CPUs with 1.5 MB on-chip cache per CPU. It has
hardware partitioning capabilities, so you can run up to 16 different
instances of the OS. They're currently supporting only HP-UX,
but I think they also plan to support NT and Linux when they come
out with Itanium chips for this machine. Each node holds up to 128
G of memory. For storage options it can do JBOD, fibre, or HP SureStore
disk arrays.
For more info, visit HP's Superdome page:
http://www.hp.com/products1/unixservers/highend/superdome
Q I'm running an E250 with Solaris
8. I'm currently using top to look at the load on the
machine and see how loaded the CPUs are, but I'd like something
that shows the two CPUs more distinctly. Is there something else available
that does a better job?
A Although it's not in a nice
graphical format like top, try mpstat (similar style
output to vmstat et al.). This will individually list each
CPU and report the following:
CPU |
processor ID |
minf |
minor faults |
mjf |
major faults |
xcal |
inter-processor cross-calls |
intr |
interrupts |
ithr |
interrupts as threads (not counting clock interrupt) |
csw |
context switches |
icsw |
involuntary context switches |
migr |
thread migrations (to another processor) |
smtx |
spins on mutexes (lock not acquired on first try) |
srw |
spins on readers/writer locks (lock not acquired
on first try) |
syscl |
system calls |
usr |
percent user time |
sys |
percent system time |
wt |
percent wait time |
Q I've looked at the man page
for the RCS command co (check out), but I do not see an option
to specify the location where the checked out file is placed. Do
you need to write a wrapper to specify the check out location or
is there a way to pass a directory name to co?
A co lets you specify stdout
as the destination of the checked out file, so you could do the
following:
co -l -p testfile > /tmp/testfile
Be sure to copy the file back to its proper directory before checking
the file back in.
Q I accidentally did an rm -rf
foo * when I meant to do rm -rf foo* but realized I did
the wrong thing almost immediately and did a CTRL-C. I want to restore
files from a backup, so what I need to know is does GNU rm
delete in alphabetical order, or by ctime/mtime, or inode, or something
else?
A Using the "*" character
for expansion is called globbing. How files are globbed is interpreted
by your shell and not by rm. The rm command never
actually sees the "*" since it's already been expanded
by the shell. Your best bet is to do an ls (make sure you
don't have ls aliased) on that same machine as the same
user and see how your globbing is done.
Q When I try to install an RPM,
I get the following message:
only packages with major numbers <= 3 are supported by this ...
What does this error message mean, and how can I fix it?
A This message is telling you that
your version of RPM is too old. You have to upgrade RPM before you'll
be able to install this particular package.
Q I'm trying to delve further
into system security and I was wondering if you could explain the
denial of service attack called a SYN flood?
A A SYN flood is when a large number
of bogus TCP connections are initiated, but not actually established,
filling up the TCP connection buffer.
To understand how a SYN flood works, you need to know something
about TCP. TCP is a connection-oriented transport layer service
in which packets are guaranteed to arrive in order. When two hosts
want to establish a TCP connection, they use three-way handshaking,
determining the sequence numbers that the connection will use.
Host A will first send a packet with the SYN flag set to host
B to indicate that it wants to establish a connection. This packet
will contain the ISN (Initial Sequence Number) that host A intends
to use. Host B will send back a SYN/ACK and its ISN, indicating
that it received the request (ACK) and is ready to receive data
(the SYN and ISN). Host A then sends back an ACK indicating that
it is ready to send data. The two hosts now send packets back and
forth in sequence based on the establish chain of sequence numbers.
A SYN flood attack occurs when the attacker at host A sends only
SYN packets and does not accept the return SYN/ACK packet from host
B. Because of limits set in the kernel, host B can only have so
many half-open connections at any one time. When this limit is reached,
no new TCP connections may be opened. Host A merely sends enough
packets with the SYN bit set to reach this limit. The connection
attempts will eventually time out on host B, but a SYN flood from
host A sends too many SYN packets too rapidly for host B to keep
up.
If you're getting TCP connection requests from just one host,
you can block that host at the router. When under a distributed
denial of service (DDoS) attack where multiple machines are sending
SYN packets or one machine is sending SYN packets with forged IP
headers, the general workaround to protect host B is to increase
the kernel limit for initiating TCP connections and decrease the
timeout for clearing aborted connection attempts. In conjunction,
host B may also run a program looking for these half-open connections
that then sends an RST packet to clear them.
With the popularity of DDoS SYN floods, programmers are redesigning
their OSs to be more robust. If you think you may be under a SYN
flood attack, though, you can use netstat to check the number
of open connections that have received a SYN but not an ACK. There
are also tools such as SYNWatch (http://www.rootshell.com)
that sniff the network for SYN packets.
Q What are the benefits and drawbacks
of using a Web proxy/cache like squid?
A A squid server caches
static objects, so when multiple people look at the same static
content, it can be pulled from the local cache instead of going
out to the Internet to retrieve the information again. This results
in reduced bandwidth usage and quicker apparent load times for the
end user. Squid will also cache DNS lookups to help speed
up retrievals from remote machines. Additionally, squid can
be configured to read only from its cache in the case where the
Internet connection is lost (offline mode).
Additionally, you can tunnel SSL/TLS connections with the CONNECT
request method with squid. Squid doesn't actually understand
or interpret the contents; it just passes bits back and forth. Squid
also supports filtering so that clients are blocked from accessing
"restricted" sites. This is useful in a corporate environment
where policy states that certain sites are not to be viewed from
corporate machines. Another security feature is that you only have
one machine as a point of contact for outgoing requests instead
of allowing everyone's desktop direct access to Web servers.
In general, the same things that can make squid a win can
also make it a problem. For instance, data cached with squid
may be out of date with the real data on the remote server. Squid
also only caches static data, so there's no gain on pages that
use dynamic data. As well as being a single attack point for security
purposes, your squid caches are also single (or a few) point
of failure. If your squid machine goes down, but the rest
of your network is up, you still won't be able to access the
Web. Some sites that are blocked by the squid configuration
may actually be ones required for work by some people. Or, the people
being blocked may try to find ways around the proxy that will defeat
the pluses it offers. And, finally, because squid intercepts
SSL connections, there is always a chance that someone could develop
and insert a program to send the sensitive data somewhere other
than its intended destination.
Q I'm looking for a script
that will search the logs for certain regexps through a given date
range. I could write my own but would rather find something already
done. Any thoughts?
A This is really a fairly trivial
thing to write on your own from the command line. You would first
use grep or egrep for the desired date range and then
pass that to egrep for whatever regular expressions you want
to search for. For example, if you wanted to search for all occurrences
of Bob and Fred in the mail log from October 1st through 9th, you'd
do the following:
grep "Oct [1-9]" /var/log/maillog | egrep "(Bob|Fred)"
You could also use egrep to look for those same dates in November
at the same time:
egrep "(Oct|Nov) [1-9]" /var/log/maillog | egrep "(Bob|Fred)"
If you're going to be doing this sort of monitoring frequently,
you may want to look at swatch (ftp://ftp.stanford.edu/general/
\ security-tools/swatch/) to check your log files on a real-time
basis, instead.
Swatch allows you to specify things to look for in various log
files. When swatch matches a pattern in the configuration file,
you can have it do various things like email you, page you, or run
a program. This can be extremely useful for things like watching
mail logs for denied relaying attempts or for watching syslog for
disk errors.
Q I'm using ufsdump
on Solaris 8 to back up my data partition. I've successfully
run a level 0 dump and noted that it took up about 20 G of space
on the tape. Now when I run a level 1 dump, it's dumping almost
the whole disk again, and most of this data hasn't changed
in months! I thought that level 1 should only dump the things that
changed between now and the last level 0 dump? I've done a
find from the data partition to print out all of the files
with mtime that has changed in the past couple weeks (the
last time I did a level 0 dump), and it only shows about a gig worth
of files, certainly not 19-20 G worth. Why is ufsdump backing
up things that haven't changed?
A ufsdump uses the ctime
of the file, not the mtime. The inode could have been updated
without the actual data in the file having been changed. If you
run find looking for ctimes that have changed, I'm
guessing you'll find that most of your data has been updated.
Have you run anything like chown, chmod, chgrp,
touch, mv, or cp on many of your files since
you last did a level 0 dump?
Q I've recently moved my majordomo
setup from one Solaris 8 box to another, and everything seems to
be working fine sending mail to the lists, but when people try to
unsubscribe, I get the following error in the log file:
{user@myaddress.com} ABORT chown(110, 104, \
"/usr/local/majordomo/lists/testlist.new"): Not owner
110 is the UID of majordomo, and 104 is the GID. Both /etc/passwd
and /etc/group have the right information.
I checked the majordomo FAQ, and it said that chown
errors are normally caused by the wrong permissions on the wrapper
binary, but it looks okay to me:
-rws--x--x 1 root majordom 27112 Sep 19 20:17 wrapper
After the unsubscribe command fails, I'm left with an extra file
in the lists directory:
-rw-r--r-- 1 majordom web 19 Sep 19 20:17 testlist.new
Why would the .new file be group owned by Web (GID 103)? I
suspect this has something to do with the problem, but I can't
figure out where it's getting the Web group from. Any clues?
A On the machine that you copied
majordomo over from, was the GID of majordom 103? When you
compile the wrapper, it hard codes the UID and the GID that majordomo
should run as. My guess is that the UID stayed the same, but the
GID on the new machine differs. You can check this by running the
wrapper config test as a normal (non-root, non-majordom) user:
./wrapper config-test
Your effective user should be majordom (uid 111), and your effective
group should be majordom (gid 104). If the GID is listed as something
different, then you need to recompile your wrapper with the correct
values. If you still have the source sitting around (assuming that
everything stayed in the same relative directories), you can just
remove the wrapper binary and do:
make wrapper
Then copy the new wrapper binary into place and make sure it has the
correct ownerships and permissions (as you listed above). If you don't
have the source anymore, or the location of things has changed during
the move, you'll need to get the source and change the Makefile
to reflect your current setup.
Q I have a user running Linux at
home and picking up mail at his ISP via POP3. His home account is
prone to getting quite a bit of spam (he uses it to post on various
newsgroups), and he wants to know if there's a way to filter
the spam out before he downloads it. Any suggestions?
A If his ISP supports IMAP, I'd
suggest switching to that since you can download only headers with
IMAP and delete messages without downloading the body. If POP is
your only option, though, then you can use fetchmail in conjunction
with something like popsneaker:
http://www.ixtools.de/popsneaker/
or mailfilter:
http://mailfilter.sourceforge.net/
If you were going to use popsneaker, your fetchmailrc would look similar
to the following:
poll pop.yourisp.com with proto POP3 user joe there with password \
joespass is localjoe here options forcecr warnings 3600 \
preconnect "/usr/local/bin/popsneaker"
Q I need to upgrade the OBP on my Ultra
2 and I heard there was some magic to doing this. Do you have any
pointers for me that would describe the process?
A The main difference with the
Ultra 1 and 2 boxes is that there's a jumper on the motherboard
that you have to physically move to enable writing to the flash
PROM to upgrade the OBP. Look at:
http://docs.sun.com/ab2/coll.28.20/SPARCHW/@Ab2PageView/ \
idmatch(Z400066523F1)
for detailed instructions on how to upgrade the Ultra 1, Ultra 2,
and Enterprise 450 machines.
Q I am migrating my IBM desktop
to Linux. This will be the third attempt in a year. One problem
I remember encountering is that my modem is not supported by Red
Hat. It is a Lucent Technologies Winmodem. Is there any place I
can find drivers for it?
A Winmodems don't have their
own code in firmware and instead use the computer to which they
are attached to do all of the low-level interaction for them. It
used to be that only Windows machines had drivers for Winmodems,
but Linux now has drivers for select chipsets. More information
is available at:
http://www.linmodems.org/
Amy Rich, president of the Boston-based Oceanwave Consulting, Inc.
(http://www.oceanwave.com), has been a UNIX systems administrator
for more than five years. She received a BSCS at Worcester Polytechnic
Institute, and can be reached at: arr@oceanwave.com.
|