Designing a Scalable NNTP Server Network
Chris Josephes
Five years ago, it was hard to find an Internet service provider that didn't maintain a news server on their network. Every site was expected to have one to provide Usenet access to its users. Usenet has changed considerably since then. With the increasing number of users on the Internet, Usenet traffic has grown by leaps and bounds. More articles are being posted, and the average article size has increased due to the popularity of the alt.binaries newsgroups. Most Usenet servers receive 100 GB of articles every day.
To handle this level of growth, most news servers were overhauled. Bigger disks were added to maintain a decent news spool; memory was added to handle the higher number of end-user connections; and parallel feeds were necessary to receive articles at an acceptable rate. Some news administrators saw this as a challenge -- to keep expanding servers to accommodate the growth of Usenet. Other administrators saw it as a waste of time and effort. Most ISPs never charged for news access, so they couldn't justify the costs involved in administering news servers.
If you're reading this, then you may be a Usenet server administrator or considering the Usenet requirements for your site. The goal of this article is to provide a few suggestions on how to stage one or more NNTP servers with the intention that your server network will grow. By carefully planning your initial server implementation, it will be easier to add more servers to your network. Everything covered here is platform and software independent. For specifics on how to implement some of the ideas mentioned, check your server documentation.
An Overview of Usenet and NNTP
The Network News Transport Protocol handles the transfer of Usenet news articles between hosts on the Internet. It is a client/server protocol that has two distinct methods of transferring articles: feeding and reading.
NNTP Feeding is a streaming exchange of articles between Usenet servers. A client connects to a server and immediately begins to send articles to the remote host. The remote server responds to each message by either accepting the article, refusing to accept the transfer, or rejecting the article after the transfer. In turn, the Usenet server retransmits the articles to other Usenet servers.
NNTP Reading is a selective article exchange that is controlled by the client. Instead of the client sending a stream of articles, the client requests specific articles one at a time. The client specifies which newsgroup it is interested in, requests the overview data of the articles in that group, and then downloads individual articles that the user selects. It is also capable of posting individual articles composed by the end user.
NNTP Servers
Not long ago, NNTP servers came in a one-size-fits-all package. One program (usually Innd) would be responsible for feeding remote sites, and allowing article viewing by end users. Unfortunately, Innd began to show its age as Usenet traffic increased.
New software brought competition into the server market, and inspired new interest in programming news systems. NNTP servers like Cyclone and Typhoon drastically reduced administration overhead of news servers compared to Innd. Diablo and NNTPRelay were soon established as high-performance news servers, used by some of the most popular sites. NewsCache offered the ability to cache news articles, reducing bandwidth and storage issues typically associated with news servers.
NNTP servers come in three varieties: readers, feeders, and caches. The following is a brief description of how each server operates.
Readers
Reader servers are designed to handle connections from end-user clients for the purpose of serving individual articles (see Figure 1). End users request articles using different commands defined by the NNTP protocol; these commands retrieve specific articles one at a time. The commands that provide this functionality are sometimes referred to as the NNRP Protocol (Network News Reading Protocol).
A reader server typically is connected to a large disk system for retaining articles much longer than a feeder server. It also maintains additional data to keep track of articles and improve reading performance, specifically, the overview database, and the active file. An overview database is a record of article headers for a particular newsgroup. The active file is a list of all the newsgroups that a server handles. It states whether or not the group is moderated, and provides the local number ranges that are assigned to articles within the group.
Reader servers include: Innd, Typhoon/Breeze, Diablo, Turtoise, Earthquake, DNews. Reader Clients include: tin, nn, Forte Agent, Netscape Communicator.
Feeders
Feeder servers were designed only to send and receive streams of articles from remote sites. A feeder server receives a continuous stream of articles from a remote host, and transfers the same articles to other hosts that may not already have the articles (see Figure 2).
Typically, a feeder is connected to a high-speed network and doesn't need to maintain data that a reader server would. No overdata is recorded, and sometimes no active file is maintained.
Feeder servers include: Innd, Cyclone, Diablo, NNTPRelay.
Caches
The NNRP cache is a caching reader proxy. To a real NNTP reader, it looks like a simple reader client, but it is actually handling the distribution of articles to other readers (see Figure 3).
It only requests the articles it needs, and then saves them to a local disk in case they are requested by another user. A cache server is best suited for a remote network that needs news access, but doesn't have the bandwidth to handle a full newsfeed.
Cache servers include: NewsCache, NNTPCache, Leafnode, DNews.
Example Network
For this article, I will create a fictitious news server network. The network is run by a large-scale ISP named Cyber OC3, which has four points of presence in different cities across the United States: Minneapolis, Chicago, Seattle, and New York (see Figure 4).
Each POP has one news server acting as a feeder. Each feeder exchanges articles with remote sites, customer sites, and with each other. The Chicago POP has a set of news servers acting as readers.
Setting Up News Feeds
To serve Usenet articles, you must first receive them, which means finding a remote site to send you a Usenet feed -- a task easier said than done. Once you go through all the work in finding a site, you must then find the Usenet administrator for that site. Sometimes email requests will go unanswered, or support departments may not know who administers their news server. I once dealt with a site that didn't even realize they had a news server running on their network.
It isn't always this bad, but it can become discouraging if you're trying to line up five or more decent feeds. If you need to look for a feed, try one of the following sources:
1. Upstream providers -- Most ISPs offer newsfeeds to their customers for free or for a minimal charge.
2. Commercial providers -- Some service providers have made a business by sending Usenet feeds to remote sites. They are usually high-powered servers exchanging articles with over 100 hosts at once. Some providers also send Usenet feeds via satellite for sites that don't want to saturate their land lines.
3. The Usenet-SE Peering Page (http://www.usenet-se.net/ \
peering/) -- This site lists remote Usenet sites that are looking for other sites with whom to exchange feeds. You can search for a site or add your name to the list.
4. Post a request to news.admin.misc -- Be sure to mention details about your site and state what kind of feed you are looking for.
If you find a site that interests you, do some research. Verify that a site really suits your needs before you start setting up a configuration. Use the following checklist to find out more about the site:
1. Network hops/response time -- Perform a few traceroutes from your news server to theirs at various times during the day. If you actually have NNTP access to the remote server, run a program like newsresp or nntpTime to measure the response time of the server by simulating a high number of article transfers.
2. Server software -- Usually, the software a remote site is running doesn't matter, but it doesn't hurt to know what it is. It can be helpful in debugging if problems arise. Ask the administrator what they are running on their server, or telnet to the nntp port on the host and the greeting banner should give you the information you need.
3. Subscription -- If the remote peer has an active file, ask to see it and compare it to yours. You can then determine whether the feed will fully benefit your server. If they don't maintain an active file, find out whether they do any filtering based on the newsgroup to which the article belongs.
4. Statistics -- See whether the peer has any statistics on the incoming and outgoing article rate. Don't choose a peer just because they may have a high ranking on the Usenix Top 1000; it's a bad metric for determining overall server performance.
5. Contact information -- See whether the site has established contact information so you can immediately reach the administrators if necessary. Email is usually the preferred method of contact. Some sites will also exchange phone numbers, provided they are only used in case of dire emergencies.
If you've determined that the peer is a good match for your server, exchange configuration information with the remote administrator. Tell him or her what hostnames they should send articles to, receive articles from, how many parallel NNTP connections you want on each end, the path stamp to filter out, and the newsgroups from which you want to receive articles.
Here is a what a feed configuration could look like on the Minneapolis Cyber OC3 server if it was exchanging a feed with nntp.binaries-newsfeeds.com:
AcceptFrom: nntp.binaries-newsfeeds.com
SendTo: nntp.binaries-newsfeeds.com
Similarily, the configuration on the host nntp.binaries-newsfeeds.com would look like this:
AcceptFrom: minneapolis.nntp.cyberoc3.net
SendTo: minneapolis.nntp.cyberoc3.net
Building Scalability (and Some Redundancy) into Your Network
A well-designed NNTP server network should be scalable and provide some level of redundancy in case of failure. Overall performance of the network should be improved simply by adding servers. If a server fails, the remaining servers should be able to take on the additional load with minimal difficulty.
NNTP has no built-in mechanisms to accomplish this. The only feature within the server software that can provide scalability is the master/slave article numbering mechanism for NNTP Readers. However, we can still provide scalability from the use of external systems. We can maipulate DNS to provide load-sharing and feed redirection. We can also manipulate path stamps to make sure we aren't offered articles we would probably refuse.
There are other solutions available, including the use hardware systems designed for load balancing, but I won't address those. The market is already saturated with many products that perform some form of traffic redirection and these systems are beyond the scope of this article.
Changes in News Server Networks
One issue that needs to be addessed in a Usenet network is how changes to your network affect the clients connecting to your systems. (Clients in this case can be either remote NNTP feeders or reader clients.) If you make changes on your internal network, you must make sure that those changes don't drastically affect sites that depend on you. Changes can be anything that increases/decreases the number of servers, or changes in your overall network topology that require restructuring of your feeds.
Some examples are:
1. The addition of a NNTP feeder at a point of presence.
2. The addition/removal of a high-speed network connection (DS3 or more) that would affect feeding performance positively or negatively.
3. A critical failure of a NNTP server that cannot be resolved in 24 hours.
Any one of these events could require changes in feeding relationships. Unfortunately, some changes in your local configuration mean that the remote administrators you deal with may have to make changes of their own. If those administrators aren't available when you need them, the flow of news articles could stop.
The following steps can minimize the need for remote administrators to make changes to their configurations. Thus, you can spend more time administering your own servers.
DNS Setup
An NNTP server requires two hostnames for a feeding relationship with another site: a host to accept articles from, and a host to send articles to. They don't necessarily have to be the same host, but they usually are. Any hosts that aren't explicitly listed can't connect to the news server.
If you want to move a feed to a different host, a remote administrator will need to change his server configuration to match yours. If the other administrator doesn't make the changes, the feed could stop. If you want to move a feed to a different host on your network, the remote administrator of the feed must change his configuration. Otherwise, your new server may not have permission to communicate with the remote site. But what if you could give the remote administrator a set of hostnames that would never require changes? It's possible and requires some DNS trickery and a lot of CNAME records.
Create a subdomain of your site's domain name and call it nntp. Give it a low TTL, so remote DNS resolvers don't hold onto the records for too long. If your DNS policies allow it, use your news service administration email address in the SOA record:
nntp.cyberoc3.net IN SOA ns.cyberoc3.net. news.cyberoc3.net (
20000825 ; Serial
43200 ; Refresh
3600 ; Retry
604800 ; Expire
3600 ) ; Minimum ttl
In that domain, create A records for all of your news servers:
chicago.nntp.cyberoc3.net IN A 127.0.34.5
minneapolis.nntp.cyberoc3.net IN A 127.0.54.22
newyork.nntp.cyberoc3.net IN A 127.0.10.130
seattle.nntp.cyberoc3.net IN A 127.0.20.54
For every peering feed, create two CNAME records -- one to represent incoming traffic, and one to represent outgoing traffic.
binnews-in.nntp.cyberoc3.net IN CNAME \
minneapolis.nntp.cyberoc3.net.
binnews-out.nntp.cyberoc3.net IN CNAME \
minneapolis.nntp.cyberoc3.net.
When you set up a news feed with a remote site, instruct it to accept articles from binnews-out.nntp.cyberoc3.net, and to send all articles to binnews-in.nntp.cyberoc3.net. Here is the updated configuration the administrator of nntp.binaries-newsfeeds.com would use:
SendTo: binnews-in.nntp.cyberoc3.net
AcceptFrom: binnews-out.nntp.cyberoc3.net
If you needed to redirect the feed from this site, you don't need to contact the bin-news administrator; you can just change the DNS records. Suppose changes in your external networks made it more suitable to exchange articles between nntp.binaries-newsfeeds.com and the Cyber OC3 server in Seattle. To accommodate the change, we could make the following changes in the DNS zone:
binnews-in.nntp.cyberoc3.net 60 IN CNAME \
seattle.nntp.cyberoc3.net.
binnews-out.nntp.cyberoc3.net 60 IN A 127.0.20.54
binnews-out.nntp.cyberoc3.net 60 IN A 127.0.34.5
This probably isn't what you expected, so I'll explain the reasoning behind this. The record binnews-in represents the hostname that the remote peer sends to, which is a host we control. Therefore, it's no problem for us to configure our servers to accept incoming connections from the outside host. We keep identical incoming configurations in the Minneapolis and Seattle NNTP servers. When the incoming binnews article stream changes to the Seattle server, we can remove the incoming stream configuration from the Minneapolis server.
The outgoing hostname is a bit more complex. When we change the DNS records, we can't guarantee when the remote site will see the changes. The rotating A list lets us specify two hosts from which bin-news could accept articles. This way we can move the feed at any time and the remote host won't reject the stream connection from either host. When we know that the Seattle host can send articles to nntp.binaries-newsfeeds.com, we can convert the host entry back to a single CNAME.
You could also create a catch-all host record from which remote sites can accept articles. If you want to give every peer a single hostname, create a host record with multiple resource records. This is a good idea if you have customers accepting newsfeeds, and you want to let all of them accept articles from any one of your servers:
outgoing.nntp.cyberoc3.net IN CNAME chicago.nntp.cyberoc3.net.
IN CNAME seattle.nntp.cyberoc3.net.
IN CNAME newyork.nntp.cyberoc3.net
IN CNAME atlanta.nntp.cyberoc3.net
For better performance, you may want to use round-robin A records in the above example because CNAME records require two lookups. Some DNS administrators also cringe at the idea of round-robin CNAME records.
If the news server in New York goes down for an extended period of time, we can start sending New York feeds from the Chicago news server, and the peers will automatically accept the connection from the Chicago news server, since it is covered in the CNAME round-robin.
Unfortunately, there is a downside to all of this. These methods can make host management and article routing easier, but they can't be expected to redirect feeds immediately after the records are changed. Here are a couple of reasons why:
1. News servers typically resolve hostnames only during startup, or when the configuration is reloaded. It could take days, or even weeks, for a remote news server to see your updated DNS data. If the remote site keeps seeing the old DNS information, you may have to ask the administrator to reload his configuraion. If possible, do your DNS changes in advance before the feed needs to be redirected.
2. Some news servers may only look up one A record in a round-robin entry, which would break any attempt of allowing two of your news hosts to send articles to the remote host. If the remote site does not appear to allow transfers from all hosts in a round-robin record, you will have to ask the remote administrator to reload his configuration once you have the DNS updated with the new feeding host.
Path Header Setup
To reduce the chance of sending a Usenet article to a peer that already received it, all articles contain a Path: header that lists every server it has passed through:
Path: news.cyberoc3.net!chicago.nntp.cyberoc3.net!nntp.newspeer.net
Every time the article is received by a server, it prepends a unique stamp to the Path header before sending it on. In most cases, that stamp is the hostname of the news server or a keyword to identify that server. Each path stamp is seperated by a exclamation point.
Example article:
From: Guest Account <guest@binaries-newsfeeds.net>
Newsgroups: alt.binaries.test
Message-ID: <25@binary-newsfeeds.com>
Subject: TEST - ignore
Path: seattle.nntp.cyberoc3.net!nntp.binaries-newsfeeds.com
The following sequence shows the possible transmission of an article, and how each server it passes through modifies the Path header:
1. n ntp.binaries-newsfeeds.com sends article to chicago.nntp.cyberoc3.net.
New Path: chicago.nntp.cyberoc3.net!nntp. \
binaries-newsfeeds.com
2. chicago.nntp.cyberoc3.net sends article to hub1.newsfodder.com.
New Path: hub1.newsfodder.com!chicago.nntp.cyberoc3. \ net!nntp.binaries-newsfeeds.com
3. hub1.newsfodder.com sends article to news-in.internews.com
New Path: news-in.internews.com!hub1.newsfodder.com! \ chicago.nntp.cyberoc3.net!nntp.binaries-newsfeeds.com
When you configure a news server to send articles to a host, you can specify one or more path stamps that should not be present in articles you send. This will prevent your news server from sending an article to another server that may have already received the article.
Example configuration on chicago.nntp.cyberoc3.net:
AcceptFrom: hub1.newsfodder.com
Sendto: hub1.newsfodder.com
Exclude: hub1.newsfodder.com
It can also prevent you from sending articles to a server with which you may share a common peer. By not sending these articles, you reduce the possibility of trying to send an article that could likely be rejected.
If the host hub1.newsfodder.com exchanged a feed with nntp.binaries-newsfeeds.com and with your server, your configuration might look like this:
AcceptFrom: hub1.newsfodder.com
Sendto: hub1.newsfodder.com
Exclude: hub1.newsfoddder.com, nntp.binaries-newsfeeds.com
This way, you don't send articles to newsfodder.com that they might refuse to accept. However, if it turns out that binaries-newsfeeds.com isn't exchanging articles directly with newsfodder.com; then we are filtering articles on our outgoing feed that newsfodder.com may actually need.
In a news server network, you want every article to reach all of your hosts, but you don't want articles you retransmit outside of the network to come back in by another host. To prevent outside feeders from sending you an article that any host on your network has already seen, you need a path stamp that represents your entire network. A split-path stamp is a single path stamp that looks like two separate stamps, because it contains an exclamation point. The first half of the stamp represents the news server network, and the second half represents the individual news server:
Host Path Stamp
chicago.nntp.cyberoc3.net cyberoc3.net!chgo.c3
seattle.nntp.cyberoc3.net cyberoc3.net!sea.c3
minneapolis.nntp.cyberoc3.net cyberoc3.net!mpls.c3
newyork.nntp.cyberoc3.net cyberoc3.net!ny.c3
Now you can tell every outside news server not to send articles if they already have the path stamp cyberoc3.net. The other half of the path stamp allows you to perform filtering from servers on your internal news network. The news server in Chicago will only send articles to the server in Minneapolis if the article does not have a mpls.c3 path stamp.
Updated configuration on the server nntp.binaries-newsfeeds.com:
AcceptFrom: binnews-in.nntp.cyberoc3.net
SendTo: binnews-in.nntp.cyberoc3.net
Exclude: cyberoc3.net
There are three things to remember when deciding what path stamp your news host will use:
1. Make sure it isn't too generic. If we used chicago as the internal stamp for our Chicago server, we run the risk of interfering with the feed of another site that may have chosen that stamp.
2. The path stamp should make some attempt to identify your site. By having the global path stamp cyberoc3.net, it can be assumed that the host creating that stamp was somewhere in the cyberoc3.net domain.
3. The entire stamp should be relatively short. It makes the Path header more readable, and it saves space for sites that record Path: header data for statistical processing.
Distributed News Readers
So far, I have covered scalability and redundancy in NNTP servers in a pure feeding environment, but I haven't covered NNTP servers in a reading environment. You can create a scalable reader environment, but there are more administrative issues to deal with.
In a feeder environment, each NNTP server only keeps track of articles to determine whether it has already seen the article. In a reader environment, the NNTP server needs to track which newsgroups each article belongs to, and the local number assigned to every article based on the newsgroup it is in.
For every newsgroup, there is a high counter and a low counter that correlates to the newest article and the oldest article in the newsgroup. As articles expire from a newsgroup and new articles arrive, the high and low numbers change accordingly. The numbering is tracked in the news server's active file.
Partial active file:
alt.test 0001369445 0001369025 y
comp.sys.sun.admin 0000160375 0000160051 y
news.groups 000018244 000017637 y
rec.sport.rodeo 0000014088 0000013099 y
soc.culture.canada 0000230888 0000230523 y
rec.autos.makers.saturn 000011745 000011289 y
When a news reader accesses a newsgroup, it requests the lowest and highest article numbers and the overview information for that range of articles. It then saves that information to disk. Then, when it re-checks that newsgroup, it will re-check the lowest and highest numbers. If they have changed, the reader client will download any new data and delete any old data.
In a scalable NNTP reader environment, each server needs to maintain the same article numbering data. To do this, each server trusts a single host to perform all numbering functions. A master server assigns numbers to the articles and feeds them down to the slave servers. When the master receives an article, it generates a special header in the article called an Xref header:
From: Dan Kirchen <dkirchen@cyberoc3.net>
Newsgroups: rec.autos.makers.saturn, rec.autos.tech
Subject: Transmission problems in a 1993 coupe
Xref: mpls.cy3 rec.autos.makers.saturn:11632 \
rec.autos. \ tech:14713
The Xref header contains the Path stamp of the server that created it and the local number position of the article in each newsgroup it which it appears. When the slave server receives the article, it parses the header and adjusts its own article numbering accordingly. Once an article is numbered by the master server, it can pass through any number of slave servers as long as it doesn't try to re-number the article (Figure 5). This solves the problem of having multiple NNTP servers for end users to access, but it creates a whole new one. You can only have one master server that performs article numbering, and if that host fails, it affects all of the slave servers that depend on it.
The most common configuration for slave servers is a load-sharing environment. Slaves are listed in a round-robin DNS entry with a short TTL on each record. Since the numbering is identical across servers, reader clients do not get confused on which articles it needs to download. When a client needs to post an article, it is usually forwarded to the master server, which assigns it a number, and then feeds it to all of the slave servers.
NNRP Caching Proxy Servers
Although it is possible to have multiple NNTP reading servers in a scalable environment, there is one noticeable downside -- cost. NNTP reading servers are still receiving a full stream of articles across your internal network, and those articles are usually stored on a large disk system so users can read articles at their leisure.
To any network engineer, it seems like overkill to transfer 100 GB of articles a day to a server if it is unlikely that users will even read 5 percent of those articles. Furthermore, maintaining large disks for multiple reader servers only increases the number of disks that could possibly fail.
To overcome this problem, a NNTP cache can be introduced in between the servers and the clients. An NNTP cache is a low to mid-range server designed to reduce bandwidth and eliminate the need for large disk systems. To news clients, it looks like an NNTP reader server, but to the real NNTP server, it looks like a news client. The NNTP cache only requests articles from the news server that a user wants. Then it caches them locally in case another user wants to read the same article later. Latency from the cache is minimal. Overview data is also saved locally, and the cache only downloads new data once it sees that the low and high counts for a newsgroup have changed.
The NNTP proxy can benefit customers who aren't ready for a newsfeed. If you have a customer that can't handle the bandwidth requirements of a full feed, an alternative is to have them set up a caching proxy and give them reading and posting privledges to your news reader. Some proxies can also provide added features for a customer that you may not want to add to your reader server (such as the ability to perform username/password authentication, or the ability to create local newsgroups).
This setup could also benefit the news reader that is accessed by the proxy. By using a proxy, you cut down on the number of parallel connections to the server, which provides more resources for other users. Also, by configuring access for a single proxy, you can greatly reduce the number of entries in an IP-based access list.
Conclusion
News server administration has never been a simple job. Five years ago it was filled with tedious tasks like making sure your active file was up to date, and trying to get your expiration process to finish in a decent amount of time. Now, there are new, even more tedious tasks like reducing outgoing feed backlogs and tuning your network connections so that you can keep up with the growth of Usenet traffic.
It is important that your site have an overall plan in place for Usenet services. Infrastructure and configuration standards should be planned before servers are installed. Make sure your plan is flexible to accommodate changes with minimal impact on remote sites. I covered some techniques that you can use for administering a cluster of Usenet servers, but this is by no means a complete list. There are other issues, such as distributing configurations from a central host, centralized reader client authentication, and automated fail-over, to name a few.
About the Author
Chris Josephes is a systems engineer for Onvoy, a telecommunications provider in Minnesota. He has been administering UNIX systems for six years, specializing in Solaris administration, Perl programming, Internet servers, and systems security. He can be contacted at: chrisj@onvoy.com.
|