Article

Understanding Cookies

Gilbert Held

Approximately a year ago, a number of authors realized a fact that was then two years old - the presence of cookies in the form of a text file located in a directory supported by their World Wide Web browser. Since their belated discovery, a great deal of controversy has arisen over what cookies represent and the possibility that they could be used in a manner that would violate the privacy of Web browser users. Thus, in an effort to de-mystify cookies, I will not only examine their operation and utilization, but also discuss tools and techniques that can be used to alleviate any possibility of compromising one's privacy. It may be extremely difficult for a server to extract anything from a cookie file that could be considered an invasion of privacy. However, since the file is stored as a text file, it can easily be read by anyone with access to your computer. This means that a Web server that uses a cookie to store your credit card number during an online transaction may inadvertently provide a gap in security that could cause a major problem. But, I'll first discuss the operation and utilization of cookies and talk more about security later in the article.

Overview

A cookie is a small text file stored on a user's computer by their Web browser and is "baked" or generated by a CGI script operating on a server accessed by the browser. Cookies were introduced to the Web with Netscape Navigator version 1.1, which reached the market during March, 1995. The rationale for providing a place to store cookies is based upon the state of browser/server communications. That is, the relationship between a browser and server exists only for the duration of a transaction, a term referred to as a "stateless" transaction. This means that as users access a server to shop, play a game, or perform another activity that requires knowledge of some actions that previously transpired, the server would have to store such information and be able to associate the stored information with each user. With an ever expanding body of browser users, this would be impractical. Recognizing this problem, Netscape added support for cookies to its Navigator as a mechanism for HTTP, which is a stateless protocol, to retain state information on the browser or client side of a client/server session.

The design of cookies makes them invisible to the typical browser user, hence a degree of controversy developed as people became aware of the file COOKIE.TXT. This file, by the way, is not a hidden file, and its contents are available for viewing or editing. Additionally, cookies are subject to user control, as will be discussed more thoroughly later.

Although Netscape coined the term "cookies," their documentation does not indicate the derivation of the term. As an educated guess, the name may bear a relationship to UNIX tokens, which are known as magic cookies. In fact, although stored as COOKIE.TXT in the Windows' world, the cookie file is called MagicCookie in the Netscape preferences folder on a Mac and has the name "cookies" in the UNIX world.

Examining the Ingredients

A cookie, or more precisely, a cookie entry in the cookie file is created by including a Set-Cookie header as part of a server's HTTP response to a browser query, and usually amounts to less than a teaspoon of text.

The most common method used to generate a cookie is via the use of a CGI script. Cookies are exchanged between a server and browser through the HTTP headers Set-Cookie from the server and Cookie from the browser. The format of the Set-Cookie header used to store a cookie on a client is shown below with optional attributes indicated in brackets:

Set-Cookie: Name=value;[Expires=DateTime;]
[path=Path;][Domain=DomainName;][Secure]

The components of the Set-Cookie header are summarized in Table 1. This header informs the browser that it is about to receive a tasty morsel. The next attribute provides the name of the cookie and the value associated with that name. Although there are no restrictions on the name and value assigned to a cookie, there is a limit to the amount of information you can put into a name and its associated value. That limit is 4 K of data, which is probably enough to handle just about anything you would need. The next attribute is the expiration date and time. The format of the DateTime string is as follows:

Day, DD-Mon-YYYY HH:MM:SS GMT

Here, the separators between the elements in the date must be dashes. When this date is reached, the client browser will delete the associated cookie. For example, consider the following example:

Set-Cookie: USERID=GXHELD;expires=Wed, 01-OCT-97; 23:59:59 GMT

This will result in the cookie being eaten at one second before midnight on 1 October 1997 by the browser, resulting in the stored user ID being placed in the great bit cookie jar in the sky.

Since the browser does not continuously operate in real time, entries in the cookie file are only examined when the browser is initiated and closed. Thus, if you wait until December to reuse your browser, the cookies that expired earlier will still be there, and anyone with access to your computer will still be able to view them. If no expiration date is set, the cookie will only survive for the duration of the browser session, again going to the great bit bucket in the sky when you exit your browser. The use of a cookie to set a user ID or a combined user ID and password represents a common cookie application. This enables users visiting a site that requires registration to register once and have their username and ID number or password stored on their browser, enabling automatic access back to the site without reentering their ID and password.

The path attribute tells the browser the directories in a domain for which the cookie is valid and should be returned to the server. The actual path is expressed with respect to the subdirectory on the computer where the Web server program has its default or home directory established. That default subdirectory is expressed by the use of the forward slash (/) in the PATH statement, such that PATH=/ would indicate that the browser should return the cookie to the server's default directory or any subdirectory under that directory. This means that each time the browser user accesses the server, the cookie previously stored on his or her hard drive will be returned to the server. If the path is not specified, it is assumed to be the same path as the document being described by the header that contains the cookie.

The PATH attribute is used in conjunction with the domain attribute, which tells the browser the domain name for which the cookie is valid and provides a degree of security. This degree of security occurs because the cookie will only be returned by the browser to the allowed path if the specified domain name matches the domain attribute. For example, assume a previously stored cookie has the attribute PATH=/userid at http://www.sales4us.com/usend. If domain attribute specified www.sales4us.com, then the cookie would be returned by the browser to the server.

The last attribute in the Set-Cookie header is "secure." If a cookie is marked secure, it will be transmitted only if a secure link between browser and server was previously established. If the secure attribute is omitted, the cookie will be sent under appropriate DOMAIN and PATH matching conditions regardless of whether the transmission channel is secure or non-secure.

Browser-Side Operation

The previously described Set-Cookie header provides the mechanism for a server to feed a cookie to a browser. Assuming a cookie was previously stored, when the browser reaches a URL for which one or more cookies match the domain, path, and security attributes, it will return those matching cookies. To do so, the browser will transmit its HTTP headers to include cookies in the following form:

Cookie: variable-value;[variable=value;]. . .

The actual "cookie check" is performed by a browser each time it makes a URL request to a server. When the browser requests a URL, it checks its stored cookies for potential domain, path, and secure attribute matches. Then, those tasty morsels that have appropriate matches are included in the request as above.

In the Netscape environment, a browser will accept up to 300 cookies, with up to 20 cookies per domain matching pattern. Each cookie can be up to 4096 characters in length to include the variable name. When the 300-cookie limit or the 20-cookie per server limit is reached, the browser will delete the least recently used cookie to make room for the new cookie. If a server sends a cookie larger than 4 K in size, the browser will truncate the cookie while keeping the name intact, as long as the latter is less than or equal to the maximum 4 K.

Server/Client Interaction

To illustrate the potential interaction between browser and server with respect to cookies, suppose you just visited www.coffeecup.com and registered your name as you began to surf their offerings of coffee cups. The server might generate the following cookie header:

Set-Cookie: userid=GXHELD; path=/; domain=www.coffeecup.com

Note that the preceding cookie has no expiration date. This means that the cookie is stored in memory by the client and disappears without ever being placed into your cookie file when you exit your browser. Now suppose that you visit another site and then decide to return to the coffee cup site. As your browser reestablishes a connection to that site, it sends the following header:

Cookie: userid-GXHELD

Now let's assume you find a red cup called "bigmug" and place it in your shopping cart. The server might respond to your initial shopping spree with the following:

Set-Cookie: Set-Cookie: userid=GXHELD; MUG=bigmug/red;
expires=Friday, 31-
Dec-99; 23:59:59 EST; path=/; domain=www.coffeecup.com

Based upon the preceding, if you left the site and revisited it anytime prior to the cookie expiration date, your browser would transmit the following information upon your return.

Cookie: userid=GXHELD; MUG=bigmug/red

This example demonstrates how you can visit a commercial site days, weeks, or even years after your first visit and the server will remember your previous selections. This can result in an interesting experience. When you return to the coffee cup site and order the big blue mug, you may find one or more unexpected entries in your electronic shopping cart. These can be purged if you decide you don't want them.

Cookie Control

If you open your cookie file during a browser session via the use of a word processor or by typing its contents via the DOS compatibility box, you will not see any currently received cookies. This is because browsers store cookies internally during a session and commonly update their cookie file when the application is exited. Thus, the only way to determine if you are being cookied while online is to set the Options/Network Preferences/Protocols to "alert" (under Netscape Navigator 3.X) to alert you whenever a server tries to set a cookie. The default option is to accept all cookies without a warning. Under Communicator 4.X you can accept all cookies, accept only cookies that get sent back to the originating server, disable all cookies, or have the browser warn you prior to accepting a cookie. Those choices are located in the Edit menu under Preferences/Advanced. Similar to Navigator 3.X, Communicator 4.X has a default that accepts all cookies. If you are using Microsoft's Internet Explorer 4.X you can use the View/Options Menu, select the security tab, and scroll down to Security, Cookies, selecting disable all cookies.

For users with other browsers, or browsers that do not provide an option to disable cookies, you can easily do so by making the cookie file a read-only file. On a PC, you could use the attribute command to make the file cookies.txt a read-only file. That is, you would enter the command:

attrib +r cookies.txt.

In Windows 95, you could right-click on the file, choose properties and check "read only" inside the attributes section. If you tend to be paranoid and watch "Three Days of the Condor" over and over again, you will probably want to delete all entries in your cookie file, saving the empty file before making it a read-only file. However, before becoming too paranoid, it is important to remember that cookies were developed to facilitate browser/server operations.

If you simply surf the Web and do not order items via credit card nor have registered access to a server, the entries in your cookie file if viewed by your boss or a fellow employee should not be detrimental to your employment. However, if you do order items via credit card or have registered access to one or more servers, the possibility exists that one or more cookies in your cookie file will contain private information you may prefer others not to view. Perhaps the best way to handle this situation is to create a small batch file that copies your cookie file to disk and creates a new and empty cookie file in its place. Then, if security or paranoia warrants, you can lock your cookies in an appropriate location. However, if your work area is secure or you use your browser for approved business purposes, there is probably no need to go to these extreme measures.

Because cookies are designed to be read only by the site that generated them, they should not represent a security or privacy threat. In any event, you can always view your cookie file, set its attribute to read only, and deny servers the ability to enter a cookie into your cookie file. Another option is to use one of the shareware and commercial cookie manager utility programs. These programs provide cookie-related options ranging from removing certain cookies and emptying your cookie jar (file) to providing historical reports on who is providing those tasty bits along with information about what they may represent. Most cookie manager programs also provide information on cache and history files used by your browser and provide you with the ability to remove those files. Table 2 provides a list of seven cookie manager utility programs and Web sites where you can obtain additional information concerning each program.

As indicated in this article, cookies represent a database that enables a stateless protocol to maintain a state between a browser user and a particular server. Although it is within the realm of possibility that a server programmed by unscrupulous persons could attempt to mimic a number of domains in an attempt to retrieve unauthorized cookies, this is perhaps the one area of information theft that has not occurred on the Internet. With much more viable options for obtaining information, and the ability of browser users to keep the lid on the cookie jar, cookies are an over-hyped problem that can be easily controlled.

About the Author

Gilbert Held is an award-winning author and lecturer. Gil is the author of more than 25 books and 200 technical articles. Some of Gil's most recent titles include Working With Network Images, Virtual LANs, Ethernet Networks 2ed., LAN Performance: Issues and Answers 2ed., The Complete Modem Reference 3ed., and Data and Image Compression 4th ed., all published by John Wiley & Sons of New York and Chichester, England. Gil can be reached on the Internet at 235-8068@mcimail.com.