Linux Transparent Proxy
Rafeeq Ur Rehman
Communication links have been costly ever since wide area networks (WANs) evolved, thus people have always tried to make efficient use of available bandwidth on WAN links. The evolution of the Internet has increased the importance of bandwidth management, as all of the Internet users access resources over wide area networks. Until now, different methods have been adopted to achieve this goal. The most popular method, a proxy server, was developed by CERN. Developments have also been made to the proxy server concept, including Internet Caching Protocol (ICP), which is very useful in an environment employing multiple proxy servers for distributed caching (see RFC 2186 and RFC 2187 for details). ICP addresses chaining and grouping of multiple proxy servers. Since a proxy server also needs some configuration on the client side, a technique that has become popular is the use of "forced proxy" or "transparent proxy". With this method, the client side does not need to know about the existence of any particular proxy server, instead all Internet traffic is "forced" to pass through a proxy server.
Linux includes the "transparent proxy" feature at the kernel level, starting with kernel version 2.0. Using this feature, along with Linux firewalling, it is possible to redirect all connections (originating from the local network and destined to a remote host on the Internet) to a local server called a "transparent proxy server". This process is completely transparent for the local computer (thus the name). The local computer thinks it is talking to the remote Internet host, while in fact it is connected to the local proxy server. The redirection can be made in such a way that any port can be re-routed to some other server/port. For example, we can route all Web traffic of all users of the local LAN through a proxy server even if the users have not enabled proxy server in their browsers. This is especially useful for bandwidth management where Internet connection charges are high. This feature can also be used for security purposes to block certain ports or services.
The Linux transparent proxy feature can be used in two ways. First, if Linux is used as an Internet firewall, traffic is intercepted at this point and then redirected to a desired server. Second, if there is no firewall and a Cisco router is used as gateway to the Internet, we can exploit the Cisco router's extended access lists to use this feature of the Linux kernel. This provides a good combination of hardware and software for bandwidth management. This feature can be used with commercial as well as non-commercial Proxy Servers (Microsoft, Netscape, Squid, etc.) and will work fine. The stability of this setup has been tested in configurations with and without Cisco routers. The following sections provide more details on configuration and use of this feature.
Proxy and Caching Servers
A proxy server is a computer that sits between a private LAN and the Internet. Usually a proxy server hides a private network from the Internet users and implements a security policy based upon users, source/destination addresses, ports, and services being requested. When a proxy server is used for information hiding, the private LAN is not visible from the Internet because it is "hidden" behind the proxy server.
Another important use of the proxy server is Internet caching. Objects retrieved from the Internet are saved in the proxy server's hard disk drive for future use. When the same object is requested by another user, it is delivered from the local disk instead of fetching information from the Internet. This use can typically save Internet bandwidth up to 40 percent.
Different Ways of Using a Proxy Server
In a corporate LAN or an ISP environment, when a user wants to access an Internet site, the user's request is forwarded to the destination site through a router that connects the corporate or ISP LAN to the Internet. When the destination site receives the request, it returns the required data to the user, again through the router. Although the complete process is quite complicated and many routers come along the way, conceptually, this process is shown in Figure 1.
Figure 1 shows a typical situation of Internet Service Provider (ISP) LAN. The router attached to the Internet is responsible to handle incoming and outgoing Internet traffic. The same situation exists in corporate networks. If there is a proxy server installed on the corporate or ISP LAN, systems administrators also tell users the address of that proxy server. If the browser can send requests to the proxy server and the user has enabled the proxy in his browser setting, the data path is followed as shown in Figure 2. The client's request first goes to the proxy server, which downloads the requested data from the Internet site on behalf of the client, through the router, then sends it back to the client. If the data already exists in proxy server's cache, it is not downloaded again, and the request is served locally.
Consider a situation in which the browser is not configured to access the proxy server, and many users are accessing similar Internet sites; the same data is being retrieved from the Internet again and again. This is a waste of bandwidth, especially in the areas where Internet bandwidth is quite costly. Transparent proxy can help in such a situation; transparent proxy is nothing but a process in which all the outgoing requests are intercepted and re-routed to the proxy server. It can be important from a security point of view when you want to block or redirect different ports/services.
Figure 3 shows the concept of transparent proxy. The client tries to make a direct connection to the Internet host through the router. The router intercepts this request and sends it to proxy server. If the proxy server has the requested data in its cache, it delivers it locally, otherwise the requested data is downloaded from the Internet by the proxy server and delivered to the client.
Linux Transparent Proxy
The transparent proxy feature in the Linux kernel is enabled when you compile it. It is used in addition to the Linux firewall feature. The kernel compilation procedure includes these basic steps:
During the make config process, enable the following options to use firewalling and the transparent proxy feature:
Prompt for development and/or incomplete code/drivers \
Network firewalls (CONFIG_FIREWALL) [Y/n/?]
Network aliasing (CONFIG_NET_ALIAS) [Y/n/?]
TCP/IP networking (CONFIG_INET) [Y/n/?]
IP: forwarding/gatewaying (CONFIG_IP_FORWARD) [Y/n/?]
IP: multicasting (CONFIG_IP_MULTICAST) [Y/n/?]
IP: firewalling (CONFIG_IP_FIREWALL) [Y/n/?]
IP: transparent proxy support (EXPERIMENTAL) \
The ipfwadm utility is used to filter and redirect IP packets passing through Linux box.
Packet Redirection Using Linux as a Dual-Homed Host
In smaller networks where Linux is used as a dual-homed host (a computer that has two network interfaces, one connected to private network and the other to the Internet) for Internet firewalling and routing, the configuration for transparent proxy is very simple. Let us suppose that a proxy server (e.g., Squid) is running on the same server and listening to port number 8080, and we want to redirect all Web traffic coming to port number 80 to port number 8080. The sequence of commands is as follows:
/sbin/ipfwadm -F -p accept
/sbin/ipfwadm -I -i accept -b -P tcp -S 22.214.171.124/24 -D \
0.0.0.0/0 80 -r 8080
In this example, 126.96.36.199/24 is the network address of our private network. Here ipfwadm directs the Linux kernel to redirect all packets coming from network 188.8.131.52/24 at port number 80 (no matter what their destination is) to port number 8080 (where the proxy server is listening for requests). In a similar way, any port can be redirected to any other port using the ipfwadm utility. It is advisable to put all such commands in the /etc/rc.d/rc.local file so that these are automatically run at startup.
Squid with a Dual-Homed Host
Source Quench Introduced Delay (SQuID) is a popular freeware proxy server for UNIX machines (see "Software Resources" sidebar for more information). It supports ftp, HTTP, Gopher, proxy arrays, ICP, and other useful facilities. It also caches DNS lookups. Squid supports Secure Socket Layer (SSL) for secure end-to-end communication on the Internet. Various types of access controls can also be implemented in Squid. It provides extensive request logging facility to analyze Internet traffic. For more information, please see RFC 1016.
It is recommended to compile and run Squid with a specific user and group. You can add a new user and group according to your flavor of UNIX. In Linux, this may be accomplished by using the adduser command. Make the Squid user a member of the Squid group and assign its home directory as /usr/local/squid.
You need the GNU compiler to compile and run Squid. The sequence of commands is as follows (assuming you have downloaded the Squid source code from its ftp site, and the source file name is squid-2.0-src.tar.gz):
# cp squid-2.0-src.tar.gz /usr/local/squid/
# su squid
$ cd /usr/local/squid
$ gunzip squid-2.0-src.tar.gz
$ tar xvf squid-2.0-src.tar.gz
$ cd /usr/local/squid/squid-2.0/
$ . / configure
$ make all
$ make install
All of the Squid configuration goes in one file, squid.conf. The default configuration file will allow all people access to the cache. It creates a 100-M cache area in /usr/local/squid/cache. By default, it uses 8M of RAM and keeps minimal logs. All client requests will come in on port 3128, which needs to be changed to 8080 in our case.
The first time you run Squid, it creates its cache directories. You must run Squid with the -z option to create cache files as follows:
$ cd /usr/local/squid/bin/
$ . / squid -z
This will take some time. After creating the cache directories, Squid will exit. Next, you can start Squid with the RunCache command:
$ cd /usr/local/squid/bin/
$ . / RunCache &
Installing the tproxyd Package
If the actual proxy server is different from the Linux box (which is acting as transparent proxy), we need to install the tproxyd package. It redirects the desired requests to another server. This is the situation in which we use some commercial proxy server on a different operating system (like Microsoft or Netscape proxy server on an NT server).
After downloading and extracting the source code from the zip file, use the following steps to install the tproxyd package:
tar zxvf transproxy-0.4.tar.gz
Now, add a line like the following to /etc/services.
tproxy tcp/81 # Transparent Proxy
The following lines should be present in /etc/rc.d/rc.local file to redirect Web traffic to another proxy server:
/sbin/ipfwadm -F -p accept
/sbin/ipfwadm -I -i accept -b -P tcp -S 184.108.40.206/24 \
-D 0.0.0.0/0 80 -r 81
/usr/sbin/in.tproxyd -s 81 -r nobody 220.127.116.11 8080
In this example, we redirected Web traffic to local port number 81 where tproxyd is listening for requests. tproxyd in turn sends this traffic to the actual proxy server 18.104.22.168 at port number 8080.
Using Cisco Router to Redirect Internet Traffic
Cisco router's operating system provides access lists that can be used with all of the abovementioned techniques to redirect Internet traffic transparently to a proxy server. In this case Cisco router acts as a default gateway for client computers. First of all, we define an access list and then apply this access list to router's network interface connected to the private network (e.g., the router's Ethernet port). The router configuration will look something like this:
ip address 22.214.171.124 255.255.255.0
ip policy route-map proxy-redir
access-list 102 permit tcp 126.96.36.199 0.0.0.255 any eq www
route-map proxy-redir permit 20
match ip address 102
set ip default next-hop 188.8.131.52
This defines the next hop to 184.108.40.206 for Web traffic, which is a Linux box running transparent proxy. Now, whenever an IP packet with destination port 80 enters the router's Ethernet port, it is sent to IP address 220.127.116.11. It is the responsibility of this Linux box to send it to the proxy server running locally or on another computer.
The Linux transparent proxy feature is useful for Internet bandwidth management and traffic redirection. It can be used with almost any commercial or freely available proxy servers. Routers like Cisco can also be used in addition to this feature to get desired results.
The author is grateful to Prof. Dr. Shahid Bokhari for his continuous help in his research area. The author is supported by a doctoral fellowship from the Ministry of Science and Technology, Government of Pakistan. Additional support is provided by Directorate of Research Extension and Advisory Services of the UET.
About the Author
Rafeeq Ur Rehman has completed B.Sc. and M.Sc. in Electrical (Communication) and Electrical (Computer) Engineering from the University of Engineering and Technology, Lahore. His interests include distributed computing and computer networks. He can be reached at email@example.com.