Useful
Scripts for Overworked Administrators
Mark Prager
I work for a startup company, which means we face the usual problems
of financing. Because many automated system tools are very expensive,
I have written several scripts to help automate some of my daily
tasks that monitor our system. I write these scripts in ksh
and csh, and, where necessary, a few small C programs because
these seem to be the least complicated. The C programs were compiled
with gcc. Similarly, the scripts usually come out self-documenting,
which means I can leave them running and return to them several
months later and still understand what I was trying to do. Also,
the scripts run under Solaris 2.5.1-7.0, and can easily be used
on other UNIX operating systems.
Most of the scripts are run in collaboration with cron
so that I get periodic checks, although nothing stops them being
run ad hoc. I also assume that the servers have authority to enter
each other without being requested for passwords (achieved by using
the .rhosts file in the root home directory, or hosts.equiv
in the /etc directory). This security arrangement is sufficient
for our network, however, .rhosts and hosts.equiv
are not considered secure enough for many organizations. You may
need to adapt these scripts to the security structure of your own
environment. These scripts could probably be changed to be more
efficient; however, I wrote them during a severe time crunch. Once
they were working (because I am of the opinion that you don't
fix something that is not broken), I left them as originally written.
The first script (Listing 1) is a fairly simple script that I
wrote to monitor the disk space on our servers. In the script, the
variable "limit" represents the percentage limit after
which I want to receive an alert that the disk is getting full.
The variable comp_list is the list of servers that I want
to check. The next two lines are the initialization of the output
file that will be emailed at the end of the operation. The script
then runs on each server in the list, gets the percentage use (from
the output of df) of all filesystems, and gets the filesystem
(mnt) for those percentages. The script then checks each
percentage to see whether it is greater than my quota limit. If
so, it writes to an output file detailing which filesystem is overloaded,
and with what percentage. At the end of the operation, if an output
file has been generated, it is mailed to me.
Example of the output from Listing 1:
From: Super-User [root@sword.fish]
Sent: Sunday, February 11, 2001 5:00 PM
To: mark.prager@seabridgenetworks.com
91% on barracuda : /raid308
92% on seal : /export/raid1
I run this script at hourly intervals; however, it could be run at
closer intervals, and the alerting program could be changed (email)
to an SMS messenger program or X Window pop-up. Similarly, the script
can be slightly modified to provide the usage of all filesystems on
all the servers at periodic intervals, writing the output to a file,
which could later be operated on to produce a history of the disk
space usage on all servers.
At our company, instead of having a UNIX desktop for every user,
we have a number of central servers. Using a PC tool like Exceed,
every user can log into every server. The trouble with this scenario
is that every user likes to think that the server belongs to him
alone! To discourage such thinking, I wrote a script (Listing 2)
that warns users if they have too many processes running on the
server. Although this script does not actually kill any processes,
the warnings can be annoying and are good enough to keep the users
aware of what they are doing.
The main part of the script starts on line 14. I first get a list
of all the processes on the server and filter out those users that
I don't want to be notified about (e.g., root and daemon).
The UID part filters out the banner line of the ps command.
The list is then sorted according to username. Line 16 is used to
initialize various shell variables; the variable last is
the username that will be checked. I initially gave it the value
qwert, because I know there are no users with that name on
our system.
The mailusermap file looks like this:
...
markp+mark.prager@seabridgenetworks.com+35
tvguser+tvguser@seabridgenetworks.com+70
ccadm+mark.prager@seabridgenetworks.com+100
...
It is basically a database of all the users on the system, their email
addresses, and the number of processes each are allowed to have. As
shown by the example above, markp can have up to 35 processes,
while tvguser (which might be a common or group account) is
allowed up to 70 processes.
The first time around, the loop does nothing because there is
no user called qwert. The next time around, we get the process
limit of that user (userquota), and the loop then counts how many
processes that person has. If the variable last is not the
same as variable i, then we have finished counting all the
processes for that user (remember the list was sorted on line 15).
Lines 23-29 check whether the user has overstepped her limit.
If so, the function mail_to_user is called (lines 2-13).
The lines 34 - 41 are the contents of the loop again, used for the
last user on the sorted ps list.
In the mail_to_user function, Lines 5 and 6 determines
the user to be informed of the quota overload, and line 7 is a simple
script that is called to print out a beginning of the email to be
sent. The executable on line 8, pstree, is a freebie I downloaded
from the Internet, and it prints out the processes tree list for
a given user. Line 9 finishes off the email, and line 11 emails
it to the user.
I run the following script hourly in conjunction with another
script from cron:
#!/bin/csh
set comp_list = 'stingray medusa sword seal shark salmon tuna octopus dolphin'
touch /tmp/comp$$
rm /tmp/comp$$
foreach comp ( $comp_list )
set res='rsh $comp /usr/local/scripts/count_proc_ksh'
echo $res >> /tmp/comp$$
end
cat /tmp/comp$$ | tr '@' '\n'
Notice that the last line translates @ into a new line character.
This is because line 4 of the main script prints out the user who
has overstepped his limit and which host, terminated by an @.
Hence, at the end of the script, we get a report of all the users
that have overstepped their limits sent by email (output of cron)
to the administrator. Figure 1 shows an example of the letter sent
to a user. Figure 2 shows an example of email sent to me.
One problem with having central servers is that, if I want to
find a certain process on all the servers, I must look into each
server, do a ps on it, and search for the process. To avoid
this, I wrote the following small script, which I can run centrally:
#!/bin/ksh
comp_list="stingray barracuda medusa seal salmon octopus tuna dolphin sword"
for comp in $comp_list
do
rsh $comp "ps -ef | sort | sed 's/^/'$comp' /'"
done
This script runs through the list of all the servers and for each
server, runs the ps command, sorts it, and (using sed)
adds the name of the server to the output. This script is very useful,
especially when looking for a user who is hogging system resources
through heavy commands like make and link.
A slight modification of the above script allows me to check the
availability status of the important servers at our site. The servers
need not be only UNIX, they can be NT and other black boxes such
as routers:
#!/bin/ksh
# ping all servers - when one goes down - let me know.
servers="router1 router2 accelar1 barracuda shark medusa shark sword dolphin
tuna seal octopus salmon stingray tiger hippo rhino puma fox zebra elephant wolf"
for i in $servers
do
A='ping $i 10 | grep "no answer"'
if [[ $A != "" ]] then
## Program to notify me of the problem by Xmessage
DISPLAY=172.30.30.122:0.0
export DISPLAY
echo "$i is DOWN" | /usr/local/bin/xmessage -fn charr24 \
-bg yellow -fg blue -file - -center &
# SMS Page me too
cd /users/system/mark/sms
# Cellcom
/users/system/mark/sms/page_mp "PING" "server $i is DOWN"
fi
done
Each server is pinged with a 10-second timeout. If a "no
answer" is received, an X-message pop-up window is sent to me
saying that the specific server is not answering, and I am also paged.
Because the page_mp script is adapted especially for the mobile
service in my country, I won't go into those details here. However,
the script can be easily modified to send emails via other popup windows
using Samba to inform me of a problem. Note that this script only
tells of a ping or communication problem; the server could
be down due to DoS but actually still running other activities, such
as databases.
Conclusion
It can be easy to write many useful systems administration scripts
that will save you time and money on a day-to-day basis. Many of
the expensive commercial tools cover the same aspects and provide
similar results. All of these scripts were written with standard
UNIX commands and are therefore easy to adapt. There are many other
free tools on the Internet that can be downloaded and adapted too,
such as the performance analyzing scripts written by Adrian Cockroft
using his own scripting language (http://www.sun.com/951001/columns/adrian/column2.html).
In some cases, a script might not be enough and you might need to
migrate to some other scripting language, or in the worst case,
write some simple C or other language program to handle the problem.
Mark Prager is the Senior UNIX Manager at Seabridge and has
a 15-year history with the software industry. He is skilled in many
aspects of the software industry, including software engineering,
computer security, and network planning. He is also a frequent contributer
to the CCIUG newsgroup and experienced in the management of Rational's
Clearcase and Multisite.
|