DRAFT
In order for any collection of documents, whatever medium it is stored in, to be useful as an information base, units of information require clear, consistent and unambiguous identifiers which can be used subsequently to refer to those units of information. The units may be entire documents, collections of documents, or component parts of documents.
The informal and unstructured forms of identifier and reference often used by authors ("the report of the standards committee", "the minutes of the previous meeting") perform their function effectively only because
The use of a referencing system, in which each unit is given an identifier which conforms to an agreed standard and that identifier is used in subsequent references, removes, or at least reduces, the interpretative element involved in determining the intended target of a reference.
Computer programs are much less adept at executing the sort of interpretative task which human readers perform unconsciously, and the construction of relationships between units of information in the electronic domain requires a greater degree of precision. This need for precision is compounded by the fact that in the electronic domain a set of documents may be referred to logically as a group without ever being physically brought together, and so the "scope" of the reference is no longer limited to the set of documents in one folder.
When referencing electronic documents, the tendency has been to supply this required precision through the specification of a physical file location for the target. This is the case, for example, in referencing documents delivered via the Web, through the use of a Uniform Resource Locator or URL (the "Web address" of a document of the form "http://.....")
A URL is not some intrinsic property of the document, nor an identifier of its information content (although the judicious labelling of the filestore directories may indeed have the effect that a URL does give some indication of the content). The URL is simply a label of physical location, the equivalent of indicating the position of an item on a shelf in a repository. And like the shelf location of a paper document, the physical location of an electronic document can change. When electronic documents are no longer current, for example, those which are to be archived will not remain on the same filestore device as the current documents: their URL or physical file location will change. Similarly, developments in the allocation of filestore devices might require that documents are shifted to new locations by the filestore provider for "housekeeping" reasons which have nothing to do with the use or status of the documents themselves. Such a change should be transparent to the user of the document who wishes to reference it.
In short, precisely because it identifies physical location, the URL of a document is not sufficient as an identifier: although at any given time it is unique within a specified domain, it can not be relied on to be persistent. What is required is a distinct "logical identifier" which remains with a document throughout its life cycle and which, when it is used as a reference, a computer program can use to determine the physical location of the target at that time. Such an identifier also, of course, allows document authors to create clear and unambiguous references to a target document in any medium.
There are two separate problems to be addressed:
A proposal to address the first of these problems is set out in the document Proposal for a Reference Code.
Prepared by:
Version: 3