A Framework for Automated File Transfer

George Callaway

Lost systems administrators regularly have to transfer files from machine. Writing scripts to perform this activity is part of the nature of a good admin. As a consultant, I have had to make my scripts portable because every client has a different environment with new twists. One way to deal with this is to keep as much reusable functionality as possible in ksh function libraries.

This article explores some functions that I have written to help automate file transfers. I do not promise that my scripts are bug free, or that they are the best approach in every circumstance. My objective here is to provide some examples of methods that have worked for me, and that might help.

Really, all that these file transfer functions do is to use ftp or rcp to move files. I have wrapped them with ksh code to make them a little more reliable, or a little more flexible, for unattended operation. I also provide other useful functions that appear in most of my production scripts. The complete syntax for each function is provided as comments in the function library admin_lib. (All listings for this article are available from: www.sysadminmag.com.)

A ksh function library is really just a ksh script that contains ksh functions. They could be contained in the script itself, but keeping them in a separate file allows them to be used by multiple scripts as well as allowing you to fix your bugs in one place instead of many. A ksh function is a bit of code that can be called to perform a specific task. For example, one function that I wrote simply checks the exit value from the last command, and exits if it is non-zero.

function OnErrorExit
{
typeset ErrorCode="$?"
if [[ "$ErrorCode" -ne 0 ]]
   then
   DebugMsg 1 Exiting with code of $ErrorCode
   exit $ErrorCode
fi
}

The way that you would use this in your script would be to use this function just like you would any other ksh command:

/usr/sbin/ping myhost
OnErrorExit

If the ping returns non-zero, the program will exit with ping's exit value. This approach can help keep your code be more modular and easier to read.

Once you have created your function library (or ksh script containing functions), you simply need to "source" it by typing "dot" followed by the ksh script name. You can either provide a full pathname to the script, or it can be in your PATH. Here is how it is done in the example provided:

. admin_lib

This article will introduce you to some functions that I have written, along with an example script that uses the provided function library.

ReliableTransfer -- The first function of interest is ReliableTransfer. This function was originally written as part of a method for backing up a database to a remote machine. I was using rcp, but found that I would do some silly things like overwrite files and fill up filesystems. ReliableTransfer accepts the basic syntax of rcp, and does some additional checks. For example:

ReliableTransfer MyFileName yourmachine:/home/YourFileName

transfers the file MyFileName to a remote machine called yourmachine, to a file /home/YourFileName using rcp. What I added was a list of things that I found myself doing in various scripts, with the option of printing a lot of debug information.

First, it determines whether the copy is local or remote. If the copy is a local copy, the cp command is used instead of rcp. For a remote copy, various network checks are performed to make sure that the host is available and ready for an rcp. It then checks the size of the file to be transferred against the available space on the remote filesystem, and will exit with an error if there is not enough space. It then ensures that it will not overwrite a file on the target, and commences copying the file to a temporary name. Once complete, it does a checksum against the original file and, if correct, renames the remote file to its correct name, thus alerting you if the script has failed, and why.

The ftp Functions

The next set of functions deals with ftp. They are not as thorough as ReliableTransfer, but they could be extended to use the same approach. The ftp command is preferred over rcp when security is a concern (no .rhosts files or service), or when you are transferring to a system where rcp is not supported.

What these functions do is to move the existing .netrc file (if there is one) to a backup name, then write a replacement that has commands specific to what the function is designed to accomplish. For this reason, these commands should be used with care. Another important note here is that ftp behaves differently when run in the foreground. Although these scripts work in the foreground, they were designed to work in the background, such as being run from cron. The worst case is that when you run them from the shell, an ftp failure may leave you at an ftp prompt. Just type "quit" and the script will continue.

CheckRemoteLogin -- This function tries an ftp login, and exits with an error value if the login does not work. I used this to check for a machine that was only occasionally there. This helped convince everyone that the other machine was the problem, and not my code!
GetRemoteList -- This function returns a list of files with the extension passed to it. I use it in the script to assign the returned list into a ksh variable. This list can be used to control the remaining ftp sessions.
GetRemoteFiles -- This function gets the files in the file list, and returns them to the local location given. It then renames the remote files with a .done extension to prevent them from being transferred again.
PutRemoteFiles -- This function transfers a list of files from a local directory to a remote directory via ftp.

Other Useful Functions

The remaining functions are for supporting general script creation:

DebugMsg -- This function prints a message to the console. Messages print based on the setting of an environment variable called "DebugLevel". Level 0 is supposed to be almost silent, 1 is errors only, 5 is general info, and 10 is full debug. In the example script, this function defaults to 5 if it is not already set in the environment.
CreateApplicationLock, RemoveApplicationLock -- These functions are used to ensure that only a single copy of a script is being run.
OnErrorExit -- This function simply checks for the exit value of the previous command ($?) and exits the program with that value. The value of this function is that it reads much better than putting an entire "if" statement block after every command that you want to check.
GetFileSize -- This function returns the size in bytes of a file either locally or on a remote machine.
GetFileSum -- This function returns the checksum of a file either locally or on a remote machine.
RemoteSpace -- This function returns the space left on a remote filesystem.
LogEvent -- This function writes a log entry to a specified log file either locally or on a remote machine.
Help -- This function prints out descriptions of functions in the library. This was written to promote self-documentation of the library by using comments.

Putting It All Together -- The Example Script

The scripts provided with this article are admin_lib and frame. Admin_lib is the collection of ksh functions. frame is an example script that uses many of the functions described here.

Let's walk through frame and outline how admin_lib functions are used. Of course, somewhere near the top of the script, you will need to source the function library. In ksh, this is done by sourcing admin_lib. I usually set up a function in my script that will run when the script exits for any reason. I then reference that function in a "trap". This function is where I put notification email, logs, cleanup, etc. That way, no matter how the script exits, it cleans up after itself.

function ExitGracefully  # Exit the program gracefully
{
...
}
trap ExitGracefully EXIT

I like to make sure that there is not another instance of this program running. To do this, I attempt to place a "lock" by creating a special directory in /tmp. The function I use to do this is CreateApplicationLock. If it fails, then the program is already running and should not start up again. I clean up this lock by running RemoveApplicationLock when I am done.

The next section of code checks the ftp connection. This is done using:

CheckRemoteLogin $RemoteMachineName $RemoteUserName \
  $RemoteUserPassword 2> /tmp/check$$

The function CheckRemoteLogin attempts a login based on the passed parameters. The file /tmp/check$$ is used to verify that a "good" login was achieved. It there is any output to this file, it is assumed that the login failed.

Next, a list of files from the remote machine is created:

FileList='GetRemoteList $RemoteMachineName $RemoteUserName \
  $RemoteUserPassword $LocalLocation $RemoteLocation html'

(Note that there is no line wrap in the script.) This returns a list of files with an html extension into the variable FileList. This will be used to get the files:

GetRemoteFiles $RemoteMachineName $RemoteUserName \
  $RemoteUserPassword $LocalLocation $RemoteLocation $FileList

The remote files are transferred to the local machine, then have .done appended to their names. After this, a list of local files is built for transfer to the remote machine:

FileList='ls -1 *.html 2> /dev/null'

This list is then used to push files over to the remote system:

PutRemoteFiles $RemoteMachineName $RemoteUserName \
  $RemoteUserPassword . $RemoteLocation $FileList

After PutRemoteFiles finishes copying the files over to the remote system, it places them into a local directory called "archive". At this point, the script has done all that it was designed to do, and should now exit with a successful status:

# If program makes it here, then it was successful
ErrorFlag=0
echo  >> $MailStatusFile
# Then go to ExitGracefully via trap

Setting the ErrorFlag at the end ensures that we actually made it to the bottom of the script. After this normal exit, as with any exit, control is passed to the ExitGracefully function defined earlier.

Conclusion

I hope you find this set of functions and the example script helpful. Please feel free to email questions, suggestions, and improvements to me at the address provided.

George Callaway is a Senior Technical Architect with Emerald Solutions, a professional services company based in Portand, OR. His primary interests are systems and software architecture, UNIX administration, Oracle database administration, and Java. He can be reached at: george@georgecallaway.com.