University of Glasgow
The answers to all of the above questions rely on the structuring of information, and in particular on the structuring of documents. To address several of these problems, the presence of structure alone is not the complete solution. Further, dealing with some of these issues may require that structural information is present in specific forms. But for all of them, a prerequisite for their solution is the presence of structural data in a form which can be processed by an application program.
A description of the structure of an object identifies its component parts and the nature of the relationship between those parts. The identification of an object's component parts, both in terms of functional classification and as individual members of a class, is a prerequisite for being able to manipulate them. Like other artefacts, documents have structures which their creators and users identify and manipulate.
However, the structure of document-based information has traditionally not been explicit. Rather, it has been conveyed to human readers indirectly, through the use of presentational conventions. The changed IT environment has created new requirements for software tools to be able to manipulate the component parts of documents. For such software to be able to do so, structural information must be made explicit in a form which computer tools can access.
This can be achieved through descriptive markup, and an open standard exists for the design and use of descriptive markup, the Standard Generalized Markup Language (SGML). Considerable benefits can also be derived from the structuring of documents using techniques available within proprietary authoring environments.
The Standard Generalized Markup Language, SGML, is an open, non-proprietary standard (ISO 8879:1986) that describes a generalised markup scheme for representing the logical structure of documents in a system-independent and platform-independent manner. SGML itself is not a markup language: rather, it provides a formal notation for the definition of markup languages for specific document types.
A comprehensive and regularly updated source of information on SGML, XML and related standards and their applications.
HyperText Markup Language, HTML, is a simple markup language for the creation of platform-independent hypertext documents : it is the publishing language of the World Wide Web.
HTML is a particular application of SGML to one document type, the hypertext document. In practice, proprietary extensions to the specification have resulted in the emergence of several versions of HTML, with different Web browser programs supporting different features. The references below are to the standard HTML specifications developed by the World Wide Web Consortium (W3C).
While HTML's ease of use has resulted in its huge popularity, it has many obvious shortcomings as a tool for the description of information structure. HTML is not extensible (i.e. it has a fixed element set). It describes structural relationships between elements only in a limited way, and, at least in HTML 3.2, mixes structural description with the description of presentational characteristics. HTML-encoded documents offer little scope for document reuse or for any sort of sophisticated processing of their information content.
HTML 4.0 enhances its capacity for structural description and introduces a welcome separation between the description of document structure (specified through the HTML markup) and of document presentation (specified through the Cascading Style Sheet (CSS) mechanism). However, HTML's intrinsic limitations for describing structure suggest that is best regarded as one format for delivering a document to the reader, and indeed a delivery format designed for one particular output medium, that of screen display via a Web browser.
The Web Accessibility Initiative is the umbrella label for the area of W3C's work concerned with promoting a high degree of usability for people with disabilities. In pursuing accessibility of the Web, it not only promotes specialist technology for that end, but the appropriate use of the existing standards. In this context, their guidelines for HTML authors are particularly useful.
Extensible Markup Language, XML, is a subset of SGML optimised for delivery via the World Wide Web. It aims to combine the extensibility of SGML with the ease of use and implementation of HTML.
In fact, the term "XML" is often used somewhat loosely to refer to a set of related standards:
The contents of the specifications are liable to change as they advance through the W3C Recommendation process. All efforts will be made to ensure that these links reflect the latest versions, but information on the current status of all W3C proposed specifications can be found at http://www.w3.org/TR/
The Document Object Model is an application programming interface (API) for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated.