Project Description

Connected: An Internet Encyclopedia
Project Description

Up: Connected: An Internet Encyclopedia
Up: Project Documentation
Prev: Project Documentation
Next: Downloading Connected

Project Description

Political Statement

One of the key components in the Internet's success has been the public availability of its design documents. Many proprietary networking systems, such as SNA and IPX, have guarded their packet formats and details of protocol operation as trade secrets. Even ``open'' standard organizations, such as IEEE and ISO, sell their standard documents as a primary source of revenue. In contrast, Internet design documents, the ``Request for Comments'' (RFCs), have always be available for anyone to download and study. I believe that this policy, making it easy for the public to study the Internet and learn about it, has greatly contributed to the success of this exciting technology. A key requirement of the project is the continuation of this open policy.

The Problem

Most people that want to learn about the Internet start with either a book or a course. Many "Internet for Idiots" books exist, which aim to provide the reader with simple instructions for operating Internet tools, without extensive technical discussions. Likewise, there's plenty of end-user training available, covering roughly the same audience. While most people will be content with these offerings, some will want a more detailed understanding of Internet operation. This later minority is my audience.

A brief review of the more technical material is in order. There are several excellent technical texts which explain overall Internet operation (Comer), or detail particular components (Rose). Unfortunately, few of these books are priced under $50, and building a library requires either blank checks or serious commitment. Likewise, more advanced courses are offered, but most of these come attached to some company's certification program. That means they are pricy and more attention is given to product configuration than fundamental concepts. Finally, a plethora of online documents range from "Internet for Idiots" to the exact protocol specifications found in the Request For Comments (RFCs).

The RFCs specify to bit-level precision almost every protocol that runs over the Internet. RFCs are tersely written, long (many in excess of 100 pages), formatted for line printers, and feature tables and graphics made out of text characters.

Table P-1

Typical RFC	Typical Web Document
large	small
no hyperlinks	many hyperlinks to related info
few graphics	GIF graphics
printer-oriented	Interaction-oriented

Contrasting characteristics of RFCs and Web Documents

This isn't too surprising, since the RFC structure was developed two decades ago, during Internet's formative years. At the time, there was no Web, no Netscape, no PCs. Unfortunately, these shortcomings of the 1970s are still apparent. While the RFCs that explain how the Internet works remain publicly available, they are some of the most difficult documents to access on-line. They take a long time to download (because of their size), lack a progressive range of complexity, are difficult to search topically, lack good graphics, lack good hypertext links.

The Solution

My intent is to build a free, on-line reference that explains in detail how the Internet operates. The "TCP/IP Encyclopedia" will be Web-based, featuring topic-oriented pages that will break the technical muddle into small, easy to understand pieces. The presentation will be graphical and hyper linked.

Requirements

The encyclopedia must:

Be accessible via the World Wide Web
At least for online documents, the Web is becoming the standard transport mechanism and HTML the standard presentation language.
Have a standardized page format, including navigation buttons
Web pages must have convenient width and height. The format should define the beginning and end of HTML pages. Hypertext navigation must be simple. Use an attractive and entertaining presentation.
Be designed for easy printing
If a document is spread over two dozen pages, it shouldn't take two dozen print commands to get a hardcopy of it. It would be nice to let the user download arbitrary collection of pages in a single transfer.
Contain hypertext versions of RFCs
The RFCs remain the standard documents that describe how the Internet functions. Therefore, they must be included, but may be modified. If an RFC is modified, the original version must be available.
Facilitate both topical and keyword searches
A nice feature would be to support both topical and keyword searches by allowing keyword searches on arbitrary sub-hierarchies of the topical structure. So, for example, a user could go to the ICMP Protocol screen and conduct a keyword search on ping.
The simplest form of keyword searches would be full text searches. However, the existence of a core set of topic pages may permit search modes where page relevance is measured by its hypertext "distance" from other pages, particularly core topic pages.
Attention should also be paid to search result format. It should be possible to see the first few lines of each page returned as a search result.
Achieve both technical completeness and user comprehension
Adequate supporting material should be provided so that a literate novice can eventually be expected to understand any aspect of TCP/IP operation. Problems and exercises should be included, and self-guided instruction should be facilitated.
Respond to user requests in a timely fashion
Consider server and browser performance both separately and as a system. The total user response time is a function of the system, but browser creation or modification may be infeasible.
In a CD-ROM-based Pentium environment, response to nearly every function should be instantaneous. The only exception would be for searches, which may take longer but should present some initial results within five seconds.
In an Internet environment, system response time may be dominated by any of several factors: server load, network load, server efficiency, client efficiency.
For 4 out of 5 Internet users, server load should not be the dominant factor. In other words, the server should take no longer than the network to respond.

Architecture

The encyclopedia will be constructed in stages, roughly as follows:

Tool selection and construction
- Web browser/editor - Netscape
- Graphics construction - Fig, ImageMagik
- Automated document conversion - Emacs
- Revision control - RCS
Hypertext versions of the Internet RFCs
The topical core
Programmed instruction course, with exercises
Everything else

These are the key components of Connected:

Hypertext versions of the Internet RFCs
The Internet RFCs will be converted to a format designed for presentation over the Web. Depending on the original formatting of the text-based RFCs, one of several Emacs LISP scripts will be run to convert the RFC into a rough hypertext format. The script should break down the RFC into chapters and sections. A person will then have to read and "touch up" each document. If the text RFC has a companion PostScript version, attempt to extract figures, graphs and other graphics from the PostScript and convert them into GIFs for inclusion in the HTML version. Some packet formats and other diagrams will be replaced with GIFs, constructed using XFig. References to other on-line documents will be tagged as hypertext links, making it easy for readers to follow a chain of references between authors. A person will need to come behind and "touch up" on-line references, since only through context will the correct location in the target document be tagged.
- Develop a bibliographic link style
- Develop a procedure for updating out-of-date RFCs.
- Develop a search engine
A topic list and topical index - the "topical core"
I've compiled a list of over 100 key Internet topics, ranging from the concept of an Address to MIT's Zephyr protocol. More items will be added as the project matures. Each of these topics will have a dedicated Web page, explaining the concept, program, or protocol. The most important and complex concepts will have multiple pages, and all the topic pages will feature hypertext links to relevant standard documents and related topics.
The topics will be organized in a tree hierarchy like Yahoo, which will be reflected on disk by organizing the document files in an identical directory hierarchy. This hierarchy must be constructed early on, so that each document can be placed correctly. Each directory must have an index.html files (or its equivalent), giving a table of contents for that directory, possibly with a textual discussion of the grouping's significance and hypertext links to related non-children.
TCP/IP programmed instruction course
An instruction course will be provided for those that want to read the material in a sequential manner, assuming little initial networking knowledge, and building up into more complex concepts. An initial outline will be developed early on. The course itself will be added as one of the last components of the project. Virtually all the material in the course will be available as part of the topical core. To build the course, I'll make copies of the relevant core information and add transition text.
Problems and exercises will be developed as part of the programmed instruction course. The preferred format will that of programmed instruction - the user is given a question and asked to reply on a hypertext form with an answer. If many possible solutions are available (a routing metric problem), the user should be able to view the criterion for a correct answer, with a discussion of rational. In any case, if the answer is wrong, the user should be given constructive feedback and be allowed to view the correct answer. A database of problems can be used, and presented in pseudo-random order.

The encyclopedia will make use of the following practices:

Define a standard page page format
Format must be adhered to by all pages in the system. All RFCs must be brought into conformance. No assumptions shall be made about the format of their contents, except that any hot areas such as hyperlinks or imagemaps must operate correctly. A header and/or footer will be attached to these pages, providing at least these functions:
- Page identification and bibliographic link
- Motion upwards in the hierarchy
- Motion to a tree navigator
Keyword Searches
Part of the standard page header will be a search form. The user should be able to conduct string, boolean, or regular expression searches. Basic options, such as case insensitivity, should be available.
The user should be able to set the search depth, the number of HREF links that are transversed during the search. A depth 0 search is only a find on the current page. A depth 1 search searches all pages referenced by the current page. For example, a depth 1 search on an RFC Table of Contents (the default), searches the entire RFC, because every page in the RFC is linked to by the ToC. A depth 2 search would search the RFC, as well as pages referenced by the RFC, and so on. This method allows the user to select a relevant page in the encyclopedia, then search "outward" from there.
Revision Control
Some encyclopedia documents may be tracked using RCS (or a similar system). For such documents, a hypertext revision history shall be available at the bottom of the document. Clicking on a old revision should check it out and display it, without disturbing the original. Hypertext links in old revisions should be functional.
Bibliographic database
Works cited as references in either the core material or RFCs should be listed in a bibliographic database. For work not part of the encyclopedia, it would be nice to have a hypertext link to an online book store.

Project Timeline

1995 - Inception

Apr 1996 - Initial Release

2891 Web pages
25.2 MB
81 RFCs
Topical Core

Sep 1996 - Second Release

3297 Web pages
22.7 MB
88 RFCs
Added Five Section Programmed Instruction Course

Summer 1997 - Third Release

4000+ Web pages
120 MB
1623 RFCs
Added CNIDR Search Engine, complete RFC collection
Available on CD-ROM for first time

Next: Downloading Connected

Connected: An Internet Encyclopedia
Project Description