University of Glasgow

Effective Records Management Project


Information Structure


Contents


The problems

The answers to all of the above questions rely on the structuring of information, and in particular on the structuring of documents. To address several of these problems, the presence of structure alone is not the complete solution. Further, dealing with some of these issues may require that structural information is present in specific forms. But for all of them, a prerequisite for their solution is the presence of structural data in a form which can be processed by an application program.


Information structure

A description of the structure of an object identifies its component parts and the nature of the relationship between those parts. The identification of an object's component parts, both in terms of functional classification and as individual members of a class, is a prerequisite for being able to manipulate them. Like other artefacts, documents have structures which their creators and users identify and manipulate.

However, the structure of document-based information has traditionally not been explicit. Rather, it has been conveyed to human readers indirectly, through the use of presentational conventions. The changed IT environment has created new requirements for software tools to be able to manipulate the component parts of documents. For such software to be able to do so, structural information must be made explicit in a form which computer tools can access.

This can be achieved through descriptive markup, and an open standard exists for the design and use of descriptive markup, the Standard Generalized Markup Language (SGML). Considerable benefits can also be derived from the structuring of documents using techniques available within proprietary authoring environments.


Discussion Papers


Presentations


General Documents


SGML

The Standard Generalized Markup Language, SGML, is an open, non-proprietary standard (ISO 8879:1986) that describes a generalised markup scheme for representing the logical structure of documents in a system-independent and platform-independent manner. SGML itself is not a markup language: rather, it provides a formal notation for the definition of markup languages for specific document types.

Introductory guides to SGML

General

SGML Software Tools


HTML

HyperText Markup Language, HTML, is a simple markup language for the creation of platform-independent hypertext documents : it is the publishing language of the World Wide Web.

HTML is a particular application of SGML to one document type, the hypertext document. In practice, proprietary extensions to the specification have resulted in the emergence of several versions of HTML, with different Web browser programs supporting different features. The references below are to the standard HTML specifications developed by the World Wide Web Consortium (W3C).

While HTML's ease of use has resulted in its huge popularity, it has many obvious shortcomings as a tool for the description of information structure. HTML is not extensible (i.e. it has a fixed element set). It describes structural relationships between elements only in a limited way, and, at least in HTML 3.2, mixes structural description with the description of presentational characteristics. HTML-encoded documents offer little scope for document reuse or for any sort of sophisticated processing of their information content.

HTML 4.0 enhances its capacity for structural description and introduces a welcome separation between the description of document structure (specified through the HTML markup) and of document presentation (specified through the Cascading Style Sheet (CSS) mechanism). However, HTML's intrinsic limitations for describing structure suggest that is best regarded as one format for delivering a document to the reader, and indeed a delivery format designed for one particular output medium, that of screen display via a Web browser.


Web Accessibility Initiative (WAI)

The Web Accessibility Initiative is the umbrella label for the area of W3C's work concerned with promoting a high degree of usability for people with disabilities. In pursuing accessibility of the Web, it not only promotes specialist technology for that end, but the appropriate use of the existing standards. In this context, their guidelines for HTML authors are particularly useful.


XML

Extensible Markup Language, XML, is a subset of SGML optimised for delivery via the World Wide Web. It aims to combine the extensibility of SGML with the ease of use and implementation of HTML.

In fact, the term "XML" is often used somewhat loosely to refer to a set of related standards:

Introductory articles on XML

General XML resources

XML Software Tools

Introductory articles on XSL

More detailed XSL tutorials/examples

General XSL resources

XSLT processors

Introductory articles on XLink and XPointer

General XLink and XPointer resources


Document Object Model : DOM

The Document Object Model is an application programming interface (API) for HTML and XML documents. It defines the logical structure of documents and the way a document is accessed and manipulated.

DOM examples/tutorials


Transforming and formatting SGML-encoded documents : DSSSL

Introductions to DSSSL

General DSSSL resources


Linking between objects : HyTime


Other resources


Effective Records Management Project home page
Page creator: Effective Records Management project erm@archives.gla.ac.uk
Page last updated: 16 November 1999