SGML specifies an abstract syntax and a reference concrete syntax. Aside from certain quantities and capacities (e.g. the limit on the length of a name), all HTML documents use the reference concrete syntax. In particular, all markup characters are in the repertoire of [ISO-646]. Data characters are drawn from the document character set (see 6, "Characters, Words, and Paragraphs").
A complete discussion of SGML parsing, e.g. the mapping of a sequence of characters to a sequence of tags and data, is left to the SGML standard[SGML]. This section is only a summary.