This disertation on cookies was generously supplied by Dave Evans in response to a question originally posted on the ActiveServerPages mailing list.

Here is a gif you can use on your web page to link to here to provide your users with an explanation of cookies. The image was also supplied by a participant of the ActiveServerPages mailing list (although I can't remember his name to provide proper credit).


Cookies and The Web: The Goal is Rich Interactivity

Overview

Responding to the legitimate concerns of Internet consumers, a diverse group of web developers has voluntarily formed an ad-hoc team and prepared the following article. And accompanying technical paper and FAQ listing. These materials are not intended to "defend" the use of cookies: rather they are offered as a developers view of the role of cookies in supporting interactive and increasingly knowledge-oriented web sites. The article and FAQ listing focus on the issue of privacy and offers quick tips on protecting your right to privacy. The accompanying technical paper includes explanations of the underlying technologies, the role of cookies, discussions of the positive and negative ramifications of this technology, and selected references on this and related topics. The ultimate goal of this effort is the presentation of a balanced view of this emerging Internet technology for use by Internet consumers.

Cookies: A Quick Primer

A cookie is little piece of information that a web site may use to identify your computer. Sites that create cookies typically store information about your use of that site in a database: when you return to the site, the cookie is used to find your information and to tailor your current visit based on what you did in your prior visits. To express this in more familiar terms, when you enter a store for the first time, a professional salesperson will greet you, ask your name, and offer to help you. When you return to the store, that same professional will remember your name, and will recall what you purchased, what you liked and didn't like, and will use that information to make your subsequent visits more enjoyable. Cookies allow web developers to create the same environment in a web application.

The Benefits of Cookies

Cookies allow a web application to respond to you as an individual. By gathering and remembering information about your preferences, the web application can tailor its operation to your needs, likes and dislikes.

For example, a cookie can be used to remember your name and the colors and fonts that you prefer to see -- to be fair, a password could do the same thing, but doesn't it just feel nice to be greeted by name when you walk into a store? Sure it does! Cookies can keep track of what you are doing while using the application. When you visit an electronic store, a cookie makes it easy to shop by allowing you to drop things into a virtual "cart" -- the cookie actually keeps track of your cart versus others in use at the same time.

The benefits of cookies can be summed up simply: cookies allow web developers to create better web applications, applications that are more personal, easier to use and richer in their degree of interactivity. Of course, cookies do not in themselves make a winning application, and many great sites exist that don’t use cookies at all. Cookies are simply one technology out of a range of tools that developers use to improve your experience on the Web.

Your Right to Privacy

You have a right to privacy: you have a right to expect that the information you share will be used for purposes reasonable related to the context in which you offered the information. Think about the salesperson again: you would expect that, having purchased a classical CD, the salesperson may let you know about an upcoming symphony performance. However, you would be likely be offended if someone representing the symphony called you and said "based on your purchase last week, we would like to ask…" This example illustrates the difference between the proper and improper use of your personal information.

When visiting web sites, you have no doubt been asked to sign guest books, register for promotions or free downloads, or to provide personal information under similar conditions. What happens to that information? The answer -- and there are as many as there are web sites -- can be surprising.

For many years, long before the Internet and the World Wide Web became popular, a common site in many businesses was a contest form and entry box: "Fill out this form for a chance to win a car!" Did you ever win a car? Did anyone you know ever win a car? The contest wasn't about the car -- the contest was about getting your name and address, which was then sold to mail order solicitation firms, who in turn sold advertising and contact information. The same holds true on the Internet: your personal information is worth money, and there are businesses set up to gather that and re-distribute that information.

Your Right to Know

Let's pull a couple of things together now: cookies, which allow a developer to create an application that learns about you and about your preferences, and the business of reselling information. There is an immediate and obvious problem: but it is not the technology, it is the way in which the technology might be used. Cookies did not create the information re-selling business, nor did a cookie gather the information that someone else may be willing to pay for. Instead, the cookie created the link between your name -- obtained when you singed the guest book at the web site -- and the pages that you browsed while at the site.

The important point is that you provided the information -- most likely based on your perception of how that information would be used. Imagine that you are walking down the street and someone you've never seen asks to see your driver's license. "I just want to look at it, honest!" Sure. Bye. Now imagine that the person is a cashier, that you are in a store and that you are writing a check. "I'll need to see your driver's license." Without a thought, you present it and complete the transaction. The difference is the context -- and your expectation of how and for what purpose the information would be used.

When you visit a web site, before you offer any personal information, think about the site. Is it a site you visit often? What happens to the information collected? Does the site offer a written policy about its use of personal information? You have the right to know, and you have the option of not providing personal information until you do know.

What You Can Do To Protect Your Privacy

We live in an information-based society. While that is no reason to give up hope of privacy, trying to shield yourself from all intrusive behaviors is probably not the most practical solution either. Remember these points, and you'll have blocked the majority invasive practices:

  1. Don't give information to strangers: sure, it's the Internet, but a quick call to friend or a letter to the web master will often shed light on the motives of a web site you've not visited before.
  2. Don't provide anything until you are comfortable with how it will be used: look for the site's policy, and if you don’t see it, write to the web master and ask for it. If you don’t get a response, do you really want to do business at that site?
  3. Do try to understand the benefits and exposures of new technology. It is your responsibility to decide how you will use the Internet, and for what purposes.
  4. Don't assume that by rejecting cookies you are safe: cookies are merely the technology -- the danger is the re-use of the information that you provide.
  5. Do take the time to report businesses that misrepresent themselves, either in the services they deliver, or in the way in which they use information that you provide.

On behalf of all developers, we very much appreciate your taking the time to read this article, and for your efforts in educating yourself on the responsible use of emerging web technologies.


Cookie FAQ

What is a cookie?

A coded piece of information, stored on your computer, that identifies your computer during the current and subsequent visits to a web site.

How are cookies used?

Cookies are used to establish a link between your computer and the web server that can be used to differentiate your successive requests from those of other web visitors.

Who uses cookies? Why?

HTP, the protocol of the web, cannot by itself establish a relationship between successive requests from a specific web browser. Sites that offer commerce, that remember preferences or that allow you to progress through stages of an application may use cookies. Cookies are used to establish the identity of your computer.

How can I tell if a cookie is being used?

Many modern browsers allow you to set a warning that is issued before a cookie can be placed on your system. However, with the increasing consumer driven demand for higher performance applications, sites that use literally dozens of cookies will quickly exhaust the patience of a warning-aware web surfer. Most people have turned the warning feature off as a result.

How can I protect myself?

Regardless of the setting of your warning notice, the best way to protect yourself is to carefully consider the purpose and policies regarding personal information before you provide any information. On the Internet, no one knows you're a dog …unless you say so!

What information does a web server collect automatically?

Web browsers collect information about your network address, your browser type, the date and time and other similar items. Web server *do not* have access to your hobbies, your name, your address or other highly personal data as a general rule unless you tell them. One notable exception is your e-mail address: if your web browser has an integrated email tool, some web servers can also read your e-mail address.

What is information re-selling? Who does it?

Your name and address are valuable assets: they say a lot about you. Combined with a few web visits, they say even more. This data, especially when gathered over large groups, is very valuable to retailers, mailers and others. Note that many of these have only good intentions: however, you have the right to your privacy, a right you express b deciding where and when to provide personal information.

How big is a cookie? How much space can cookies use?

According to the draft specification issued by Netscape Communications, the limits regarding the size of cookie and space occupied by all cookies is:

Do cookies ever get stale? What happens to them?

Cookies may be coded with an expiration date. If so, the cookie is discarded after the expiration date. If the above limits are reached, a subsequent attempt to accept a cookie will result in the discard of the cookie last used in the most distant past.

Can I get rid of cookies on my system?

Yes. Internet Explorer and Netscape Navigator both store cookies in folders: you may empty the folder, thereby removing all cookies for your system.

What is a null cookie?

A null cookie - a very recent and responsible innovation partly in response to concern - is a cookie which is otherwise empty, and which signals to the web server that issued it that this user does not wish to have any additional cookies written to his or her system. Note that this is a per-server technique.


The Use of Cookies in Contemporary Web Applications

Responding to the legitimate concerns of Internet consumers, a diverse group of web developers voluntarily formed an ad-hoc team and prepared the following technical paper. This paper is not intended to "defend" the use of cookies: rather it is offered as a developers view of the role of cookies in supporting interactive and increasingly knowledge-oriented web sites. A related article and FAQ listing focus on the issue of privacy and offers quick tips on protecting your right to privacy. This technical paper includes explanations of the underlying technologies, the role of cookies, discussions of the positive and negative ramifications of this technology, and selected references on this and related topics. The ultimate goal of this effort is the presentation of a balanced view of this emerging Internet technology for use by Internet consumers.

The Concept of State

To understand the application of cookies as they are currently applied on the Internet in web applications, one must first understand the concept of state. State is the characteristic, which identifies successive web transactions initiated by the same person between a browser and a specific web server from all other web transactions occurring on that server at that time. Simply put, capturing state allows the web browser and web server to exchange information during successive requests with the full knowledge of the history of that set of transactions. Compare this with a basic set of web transactions: each transaction is a separate entity with no knowledge of the prior or subsequent transaction.

State information allows the application designer to identify a particular browser, and to associate successive web requests with that browser: put another way, state information allows the application designer to differentiate between different users as they progress through a web application. Through state information, a user may express a preference in an introductory section of a web application, which is then used to select and create content in subsequent sections of that application. Expressing an interest in History while creating a customer profile enables the application to highlight items of interest to History buffs later on.

Hypertext Transmission Protocol

HTTP, the protocol of the web, is an inherently stateless protocol. According to Tim Berners-Lee, who conceptualized and defined the HTTP protocol in 1992:

"HTTP is a protocol with the lightness and speed necessary for a distributed collaborative hypermedia information system. It is a generic stateless object-oriented protocol..."

Source: http://www.w3.org/pub/WWW/Protocols/HTTP/HTTP2.html

Because HTTP is stateless, applications that benefit from state-aware transactions require an additional technology in order to capture and preserve state.

In order to support an interactive application in client/server environment such as the Web, it is necessary for both the browser and the web server to understand where they are with regard to an overall application path or plan. For example, in a typical game, prior to proceeding to a more difficult level it is necessary to complete a less difficult level. This illustrates the basic concept of state: knowing which players have completed the lesser levels and who are thereby qualified to proceed to the higher levels. Without state information it would be necessary to ask a player if he or she has completed the first level before offering the second. While this may be sufficient for a basic gaming application, more advanced play environments, financial transactions and a whole group of online commerce transactions, to name just a few, clearly require a more advanced method for establishing and maintaining state information.

It is for these reasons that state is essential: yet, state is missing in the underlying design of HTTP. Fortunately, as it's designer intended, HTTP is a highly flexible and adaptable protocol: state information can be accommodated, thereby enabling rich, interactive experiences.

The next section presents several options for discerning and maintaining state along with the plusses and minuses of each.

Establishing and Maintaining State

In order to establish state-in other words, to identify two seemingly unrelated transactions as actually originating as a successive, related set of transactions, it is necessary to somehow mark the transactions as requests originating from the same client, or web browser. If you were the only person using a particular web server, establishing state would be trivial: the first request received at the server would be your first request, the second would be your second, and so on. However, if a second web viewer were introduced it would be necessary to label your requests-perhaps with a blue mark -- and to similarly label those from the other browser, in this case with a green mark. The trail of blue marks and green marks would then identify the successive requests of each user, and state could be inferred.

Now, let's introduce a slight complication: let's allow the viewer to hop around within the application. Our simple chronological trail is no longer sufficient --- since the user can hop around, we cannot be sure that the last green marker received represents the most advanced point in the application the viewer associated with the green marker has visited. No problem -- we can just number the markers according the various sections of the application and keep track of the highest marker number. Now we know who has done what: referring to our gaming example, we can confidently allow the blue player to advance when the blue player has met the requirements of the first level.

The numbered markers are in actuality a very accurate example of the tokens that are passed between a client and server to maintain state. These tokens can take a variety of forms: some of the leading forms include hidden variables, additions to the URL, and cookies. Each of these has merits and caveats. For example, it is possible to pass a TYPE=HIDDEN variable (not really hidden at all, since you can look at it through the View Source feature of your browser) to the server using the standard HTTP FORM request. However, this requires that each such request be initiated with a SUBMIT or similar action, a requirement that may impose an unrealistic interactive constraint on an immersive application. Similarly, the token may be attached to the end of the URL and passed on to the server as a part of the address. While this technique has been a part of many standard web applications since the web was introduced in 1992, it is not particularly well suited to applications requiring a secure environment since the token is passed is plain view of anyone who can look at your screen and read the address line! Cookies--tokens much like our numbered green and blue markers--overcome the shortcomings of both of the previously described methods. Unlike HIDDEN variables, cookies do not need to be passed through the HHTP FORM process and so do not require an explicit SUBMIT action; unlike URL additions, cookies are not plainly visible unless of course to choose to look at them. However, cookies do require that the web server write a small, coded piece of information onto your hard disk -- and it is partly this behavior that is at the center of the current concern.

There are many types of applications that benefit from or even require knowledge of state information. Financial applications, games, interactive learning tools and many other web applications require that the progress of a viewer or information related to preferences or individual choices be maintained in the context of the application. By attaching a marker to each transaction, it possible to create just such a state-aware environment, and in so doing support these advanced applications. Techniques such as those described, and especially the use of cookies, are increasingly in demand as web applications become more sophisticated. Contemporary applications rival what has been possible for over 10 years in a CD-ROM or desktop environment while adding the element of multi-user and distributed interactivity and information sharing, the core elements of the ongoing knowledge transformation in our contemporary, global society.

Extending State across Web Visits

In the preceding sections the need for state and several techniques for maintaining state have been described. In this section, the concepts are extended to cover not only successive requests at a single sitting, but successive sessions between a particular web visitor and a favorite web site over a period of days, weeks or even months. As an example, consider a travel services web site: you supply your name, your favorite vacation spots, and your budget guidelines. The web application offers a series of vacation options: Sadly, you realize that your day planner is at the office, and you can't make your choices until you look at it. Now what? Without state information, after you disconnect from the Internet, visit the office, return with your day planner and then re-establish your connection, you would have to re-enter your preferences. In a simple example, this may take just a minute, but in a real-world application, you may have spent 10-15 minutes defining your particular preferences, just as you may spend a considerable amount of time building a relationship between you and your travel agent. If you move, you have to re-build that relationship.

By using state information, your preferences can be stored and retrieved automatically; as soon as you return to the travel web site, the site "recognizes" you, just as your travel agent recognizes you when you walk into his or her office. This is accomplished using a cookie: a cookie that was stored on your computer by the web application and then examined when you returned to the site. This could have been accomplished using a user name and password just as easily: but seriously, who really wants to remember another one of those?

Note that the cookie did not store your actual preferences: that would be use more than the minimum required space on your disk, something no one seems to have enough of. Instead, the cookie contains an identifier -- a nick name if you will -- that can be used to look up the information that you provided about yourself which was stored on the web server. Included in the cookie standard is a limit to the overall size of each cookie (4 Kbytes) and to combined number and size of all cookies stored on your hard drive at any one time (300 cookies; 1.2Mbytes). These limits have been imposed to prevent the take-over of precious disk space and/or extended download times of extraordinarily large cookies. Additionally, cookies are often coded with an "expiration date" -- if you visit a site that adds its cookie to your system and then never return to the site, that cookie will eventually either expire or be pushed off of your system by a newer or more recently used cookie.

Privacy Concerns: You Have a Right to Know

So what else is stored in a cookie? A better question is "what else is stored in a database that may be linked to that cookie?" The answer is "whatever information you willingly provide." Note the word "willingly" -- it is an important word. In a recent news story, the person being interviewed stated "if you visit the Playboy site, and then the CNN site, that information can be tracked." True? Sort of. The key here is to understand what information can be tracked, and perhaps more importantly, what cannot be tracked. As with the state concept, a quick technical digression is in order: it is important to understand how you are identified on the Internet.

Most people connect to the Internet using an Internet Service Provider (ISP) or an online service. These firms have large pools of Internet addresses -- called IP addresses -- which are randomly assigned and re-assigned as people need them. During the time you are connected, you are assigned a specific, unique address; when you disconnect, the next person who calls may be assigned the address that you just gave up. It is this address that is tracked by most web servers, and as you can see, it does not identify you, but rather identifies you as a subscriber to a particular service.

So how can you be tracked? Well, unless you willingly provide some additional information, you really can't be. Your Internet address identifies you as a customer of your ISP, not as an individual. Web servers are typically able to read your IP address, the type of browser you are using, the time of day and similar items related to your specific request for data. This generally does not include personal data, nor does it typically include data that identifies you specifically.

However, if you sign a guest book, or register with an online mall, you are truly identifying yourself, and it is at this point that you should exercise caution. Look through the web site, read the policies about the re-selling of information (if provided) and ask questions via e-mail before you subscribe if these policies are not provided.

A quick note about combined web browser/ e-mail tools is in order It is possible for some servers to read the e-mail address of web browsers that include e-mail functionality. This means that along with your IP address, your e-mail address can be discerned. If this is not something that you are comfortable with, use a separate e-mail application and do not include your e-mail information in your browser setup.

While all of this may seem scary at first, think back about your travel agent: he or she knows quite a lot about you. In fact, it is unlikely that you would do business with any professional that did not take the time to understand, or who could not remember, your specific likes and dislikes. Just as you trust the people with whom you chose to do business, you are being asked to trust the application -- more correctly the motives of the people responsible for that application. Before you divulge any personal data, think about how it might be used. If the web application does not clearly articulate its policy on reselling information, you may wish to look elsewhere for similar services. Generally, an e-mail to the web site administrator or Customer Service contact is sufficient to establish the answers to these types of questions.

An excellent discussion of cookies and potential issues can be found at the Whitehead Institute for Biomedical Research/MIT Center for Genome Research "The World Wide Web Security FAQ" web site, located at http://www.genome.wi.mit.edu/WWW/faqs/www-security-faq.html; the cookie discussion is located at http://www.genome.wi.mit.edu/WWW/faqs/wwwsf7.html#Q64.

Summary: It's About You, and it's About Technology

That the Internet provides an increasingly rich set of applications -- and an increasingly sophisticated set of less-well-intentioned applications -- is old news. Like any new technology, there are new opportunities for beneficial progress along with new traps for the unwary. The vast majority of web developers are building new applications in an attempt to deliver new services, new value and new products; also, to bring existing products to people who lacked prior access.

It is essential that you take the time to learn about the technologies that you use, and that you take the time to learn to learn about the organizations that you will hire or use to deliver them. Avoiding any specific technology, as a general rule, is about as effective in a technology-based society in ensuring that you are not harmed as is staying home alone all day. People get taken at home on the phone all the time. Understanding the beneficial aspects of a particular technology along with the risks that it carries and then using that information to make an informed choice is a far better strategy, a strategy that will increase your enjoyment of these newer technologies and decrease your likelihood of disaster.

By taking the time to read this paper, and to visit the reference sites contained herein, you have gone a long way to educating yourself and creating your framework for an informed, rational choice. On behalf of all web developers, we thank you for the opportunity to share this information, and for your time spent reading it.