4.2.2. Conventional Representation of Newlines

Connected: An Internet Encyclopedia
4.2.2. Conventional Representation of Newlines

Up: Connected: An Internet Encyclopedia
Up: Requests For Comments
Up: RFC 1866
Up: 4. HTML as an Internet Media Type
Up: 4.2. HTML Document Representation
Prev: 4.2.1. Undeclared Markup Error Handling
Next: 5. Document Structure

4.2.2. Conventional Representation of Newlines

SGML specifies that a text entity is a sequence of records, each beginning with a record start character and ending with a record end character (code positions 10 and 13 respectively) (section 7.6.1, "Record Boundaries" in [SGML]).

[MIME] specifies that a body of type `text/*' is a sequence of lines, each terminated by CRLF, that is, octets 13, 10.

In practice, HTML documents are frequently represented and transmitted using an end of line convention that depends on the conventions of the source of the document; frequently, that representation consists of CR only, LF only, or a CR LF sequence. Hence the decoding of the octets will often result in a text entity with some missing record start and record end characters.

Since there is no ambiguity, HTML user agents are encouraged to infer the missing record start and end characters.

An HTML user agent should treat end of line in any of its variations as a word space in all contexts except preformatted text. Within preformatted text, an HTML user agent should treat any of the three common representations of end-of-line as starting a new line.

Next: 5. Document Structure

Connected: An Internet Encyclopedia
4.2.2. Conventional Representation of Newlines