DRAFT

UNIVERSITY OF GLASGOW

Effective Records Management Project

Referencing Information in Electronic Documents

Introduction

In order for any collection of documents, whatever medium it is stored in, to be useful as an information base, units of information require clear, consistent and unambiguous identifiers which can be used subsequently to refer to those units of information. The units may be entire documents, collections of documents, or component parts of documents.

The informal and unstructured forms of identifier and reference often used by authors ("the report of the standards committee", "the minutes of the previous meeting") perform their function effectively only because

  1. the "scope" of the reference - the range of documents from which the specific target is to be identified - may have been limited by the bringing together of a subset of documents into a single physical "file" or "folder", such as the traditional "brown envelope" of papers for a meeting.
  2. the human reader of the source document (the document containing the reference) performs a sophisticated interpretation of that reference in order to determine its intended target. They derive additional information from their understanding of its context of use - information which may not even be explicitly recorded - in order to construct a more precise definition of the target.

The use of a referencing system, in which each unit is given an identifier which conforms to an agreed standard and that identifier is used in subsequent references, removes, or at least reduces, the interpretative element involved in determining the intended target of a reference.

Referencing electronic documents

Computer programs are much less adept at executing the sort of interpretative task which human readers perform unconsciously, and the construction of relationships between units of information in the electronic domain requires a greater degree of precision. This need for precision is compounded by the fact that in the electronic domain a set of documents may be referred to logically as a group without ever being physically brought together, and so the "scope" of the reference is no longer limited to the set of documents in one folder.

When referencing electronic documents, the tendency has been to supply this required precision through the specification of a physical file location for the target. This is the case, for example, in referencing documents delivered via the Web, through the use of a Uniform Resource Locator or URL (the "Web address" of a document of the form "http://.....")

A URL is not some intrinsic property of the document, nor an identifier of its information content (although the judicious labelling of the filestore directories may indeed have the effect that a URL does give some indication of the content). The URL is simply a label of physical location, the equivalent of indicating the position of an item on a shelf in a repository. And like the shelf location of a paper document, the physical location of an electronic document can change. When electronic documents are no longer current, for example, those which are to be archived will not remain on the same filestore device as the current documents: their URL or physical file location will change. Similarly, developments in the allocation of filestore devices might require that documents are shifted to new locations by the filestore provider for "housekeeping" reasons which have nothing to do with the use or status of the documents themselves. Such a change should be transparent to the user of the document who wishes to reference it.

In short, precisely because it identifies physical location, the URL of a document is not sufficient as an identifier: although at any given time it is unique within a specified domain, it can not be relied on to be persistent. What is required is a distinct "logical identifier" which remains with a document throughout its life cycle and which, when it is used as a reference, a computer program can use to determine the physical location of the target at that time. Such an identifier also, of course, allows document authors to create clear and unambiguous references to a target document in any medium.

Requirements

There are two separate problems to be addressed:

  1. The creation of a structure for an identifier, based on the principles that it should be easy for both a human author and a computer program to construct and use; it should give an indication of the origination and purpose of the document; and it should be easy to ensure that the identifier assigned to a document at the time of its creation has not already been used.
  2. The development of a mechanism to transform uses of such identifiers as references into pointers to the physical location of the target at the time of reading the document.

A proposal to address the first of these problems is set out in the document Proposal for a Reference Code.


RTF version of this document

ERM Docs index


Prepared by: Pete Johnston, Julie Cargill and Claire Johnson (Effective Records Management Project)

Version: 3