Cover V10, I07
Article

jul2001.tar


Proxy FTP without the Browser

Robert Chuba and Anthony Caruso

Consider a small unregarded little network, which is in protected RFC 1918 IP space sitting behind a firewall. Direct access to the Internet is not allowed. Instead, Web access is provided by an HTTP proxy server. While this meets most of the office needs, downloading files from an ftp site is only possible through the browser. Writing an ftp script to periodically download files, such as virus .dat files, is impossible. Instead of opening a port on the firewall for ftp and breaking the rules just for us, we decided to try something else.

Knowing we could access the ftp site via a browser, we considered how the browser manages requests for ftp://ftp.somesite.com when typed into the location text box. Further, we wondered if we could use the same mechanism via a script to make HTTP GET requests through the proxy. Listing 1, pftp.pl, or proxy ftp, is the result of this exercise. (Listings are available for download from the Sys Admin Web site: http://www.sysadminmag.com.)

Our script relies on the LWP (library for Web access in Perl) UserAgent module. LWP is a collection of modules that provide an API to Web functions and is part of the Linux standard distribution that can be downloaded from: http://www.linpro.no/lwp/. The UserAgent class implements simple WWW requests. Our script uses the functions to make ftp requests instead of http requests. Other than that, and some user interface options, the script is essentially the same as the example in the UserAgent documentation.

Usage

The script has two modes of operation. It can be used from the command line to ftp a file:

pftp.pl -u ftp.somesite.com/pub/docs/README
or to get a couple of files:

pftp.pl -u ftp.somesite.com/pub/docs -r README,freecode.tar.gz,morecode.tar.Z
pftp.pl -help lists the command-line options.

The script can also be used from a cron job. Though batching a script is nothing new, we decided to use the data section to placed defaults so the data stays with the script. That is, we wanted the ability to change the name of the script to correspond with its function. For example, we created a copy of the script named pftp-av.pl and changed the data section to read:

__DATA__
proxyxport = 8080
ftpuser = anonymous
pftppwd=us@oursite.com
proxy = 10.1.1.1
url=ftp.antivirus.com/pub/newdatfiles
remotefiles = new.dat,update.txt,ver.txt
localpath=\\avserver\distribution\
This downloads the three files listed in the remote files parameter from our fictitious antivirus distributor and places the files on our antivirus server in the distribution share.

The Script

The first few lines of the script tell the interpreter which modules we will be using. The only notable module we use is Cwd, which is used for cross-platform independence. The next block of code is simply for processing the command line. The usage message is issued if the arguments to the script are incorrect.

The while() block that follows processes the __DATA__ section to determine default settings. The data section can contain any of the named parameters listed in the GetOptions() call in the command-line processing block. For example, -proxy is an argument to the script, so the data section can hold a default proxy with the entry:

proxy = 10.1.1.1    # default proxy
Lines beginning with a #, blank lines, and white space are ignored. Comments can also follow an entry and are removed by the line:

my($right,$comment) = split /#/,$r,2;
where $r holds everything to the right of the character = for that entry. All variables (left-sides) are converted to lowercase so they can be processed later.

The subsequent block sets the script's options. Anything set on the command line overrides the defaults. Finally, a bit of processing standardizes the URL entry. Remote files to be downloaded are put into the list @rfiles. If -remotefiles is not specified, we assume the file name is appended to the end of the URL. The file name will be stripped off the URL and loaded into @rfiles.

Now that everything is prepared, we instantiate the UserAgent object, and set the member variables env_proxy, and proxy for http and ftp. Looping through the list of @rfiles, the script appends the file name to the URL and creates a new HTTP::Request. If all goes well, the files are downloaded into the specified directory.

Future Work

Although this script meets our current production needs, we still want to improve it. Features to be added when our copious free time comes to fruition include: improved error-checking, the ability to download entire directories, and to include a mechanism for authenticating to a proxy. Actually, the last item isn't difficult, we didn't want to implement it because we couldn't test it -- our proxy doesn't require logins. If you'd care to improve our script, please drop us a copy -- we'd like to see what you do.

Resources

Perl: win32 -- http://www.activestate.com

Tony Caruso has been Network/Systems consultant with MRE Consulting, Inc. in Houston, TX for the last three years and has been hacking code since he was 12 (on the Apple II). Tony has been working in both NT and UNIX for 8 years. Tony holds an MS in Computer Science from the University of Oklahoma and a BS in Electrical Engineering from Louisiana Tech University.

Robert Chuba is a consultant for MRE. He is a systems analyst with 2 years experience in Novell and 3 years in NT. Robert is working on his BS at the University of Houston.