Getting a handle on URNs

It is extraordinary how in just over a decade Uniform Resource Locators (URLs) have entered everyday life to such an extent that they are now found practically everywhere - from the side of buses to the back of cornflake packets. But this universality tends to mask the fact that they suffer from a serious defect.

Everyone has encountered the problem, which manifests itself as the dreaded "404 page not found" message. The trouble is that changes in site design, file directories and domain names can easily make a URL obsolete, with no means of automatically redirecting to the new Internet location (where it exists). What is needed is a standard way of permanently naming a digital resource similar to that provided by the International Standard Book Number (ISBN) for analogue books.

The solution is to move from URLs to URNs: Uniform Resource Names. The important thing about URNs is that they do not point directly to an Internet resource, but are rather a placeholder for the location and other metadata. This means that the URN does not need to change if the URL does: it is enough to update the redirection.

URNs sound great in theory. Unfortunately, progress towards realising them has been slow. One attempt to address what is sometimes called linkrot is the use of PURLs: Persistent URLs. This employs redirection to solve the problem of changes in directory structure, but is basically an adaptation of the URL. More thoroughgoing in its attempt to create full URNs is the Handle system.

This was devised by Robert Kahn, co-inventor of the TCP/IP protocols, and currently President of the Corporation for National Research Initiatives (CNRI). The CNRI site has plenty of information on handles, including a FAQ, articles, papers, full documentation and three related RFCs (3650, 3651, 3652). CNRI also runs a free public handle service for those who wish to try out the system before installing the free server software locally. There is also client software that lets Windows browsers resolve handles directly, and some examples of what handles look like in practice.

Two of the latter take the form doi:10.xxxx/, which refers to the digital object identifier (DOI) system. As a detailed review indicates, this is by far the most successful application of handles so far: in the middle of 2003, over 10 million DOIs had been assigned. The DOI system has a number of notable additions to the basic handle approach, including an infrastructure that is designed to ensure persistence, and semantic interoperability - the use of a consistent metadata scheme to add information about what the DOI handles refer to.

The kind of metadata that is included reflects the origins of DOI in the publishing world. As the introduction to the DOI system explains, it was designed "for persistent identification and interoperable exchange of intellectual property on digital networks". In other words, one of its aims is to make digital rights management easier by assigning unchanging identifiers.

A page of sample DOIs includes a paper from the leading scientific journal Nature, and several items from the UK's largest publisher by output, The Stationery Office (TSO). TSO has set up a site where organisations can create DOIs for their information. It has produced a useful report on various applications of the system, and recently announced that it is supplying thousands of DOIs to the Organisation for Economic Co-operation and Development (OECD). DOIs and handles are clearly beginning to catch on, and with them the idea of replacing ephemeral URLs with perdurable URNs.

Glyn Moody welcomes your comments.