Monday, 22 August 2011

Costs & Benefits

A key question arose throughout the project, not least in the two archivists' focus groups - is Linked Data worth the input of professional staff time? From the front end perspective, are the improvements for users - enriched catalogues published more speedily and improved, automated, linking with external services - sufficient to justify the extra effort required from staff? Aren't Google, Autonomy/HP and other large corporations that manage huge quantites of data doing enough already, or will do very soon (a Google 'Linked Data' button, anyone?). A fundamental point is that archivists are under enormous pressure to justify and quantify potential benefits of Linked Data to senior management through the simplication of often confusing and obscure terminology and by the use of exemplars and online test areas.

The OMP project showed that initial professional scepticism can be overcome if Linked Data can be simply defined and the benefits clearly set out. Archivists will use Linked Data if a service or services are provided that automate of simplify mark-up or the semantic process more generally and embed it within existing cataloguing workstreams. Ideally, these can be built out of trusted aggregations, authorities or cataloguing systems such as the Hub, AIM25, TNA, CALM or ATOM. They are less likely to use Linked Data if it is perceived to be a complex, though potentially useful, add-on requiring detailed specialist knowledge and delivered without support or guidance ('built it and they will come'). The ability to retroconvert legacy catalogues and CLDs with Linked Data through automation against OpenCalais and other engines will help sell Linked Data more effectively, as can validation of metadata created out of mass digitisations and OCR.

The OMP project has underlined the value of Linked Data in a number of ways:

  • Increased access and discovery
  • Increased use and return on investment in cataloguing (speeding up cataloguing, enabling tools that require an archivist to locate and link information - for example indexing, finding already-existing authority records and linking to them; finding suitable subject terms; locating places from geonames or similar)
  • Enhanced ability to justify expenditure on services and resource development (improved web-hits and connecting with heavily used services)
  • Exposure of information to novel and different uses (Combining ALM collections for the delivery of services, including commercial services - apps, exhibitions, mapping, new tools etc)
The specific benefits, as demonstrated via AIM25 are:

Updated workflow interface including:
  • Reduction of the requirement for archivists to input HTML
  • Reduction of the on-screen size of the form
  • Integration of the process of selecting access points
  • Automatic semantic annotation to aid selection of classifying terms
  • Authority lookup (internal and external - UKAT, GeoNames, etc) to improve rigour of metadata
Semantic rendering of the classification terms used by AIM25 (separate from the AIM25 access-points records):
  • SKOS representation of AIM25-UKAT data
  • RDF for AIM25 people, families and corporate names
  • GeoNames representation of AIM25 place data
Use of RDFa where available to enhance the public interface of AIM25
  • Semantic lookup allowing users to further explore definitions and instances of terms based on the properties defined during the workflow process.

The main business case is two-fold: adding value and boosting efficiency. Archivists are very attracted to the idea of enabling UKAT in Linked Data but as an active service like OpenCalais, not a look-up. AIM25 has developed a SKOS version of UKAT and a workflow tool that would link from a revised AIM25 data entry template to a LD UKAT.

Of place, personal name, corporate name and subject, subject terms are arguably the most subjective, requiring the archivist to exercise judgment on the preferred term with the collection and potential users in mind. OMP has shown that subject terms throw up the least accurate semantic returns from a linguistic analysis service such as OpenCalais (places can often be matched with absolute precision, as can personal names). OMP has improved professional efficiency by developing a hover tool to enable the archivist to select a preferred subject term from UKAT or via connecting to LD versions of LCSH/NRA and to add this term or terms to their new catalogue/CLD.

Without such automation, Linked Data won't be embedded or the data linked will be limited in scope. Flexibility is key. Focus group archivists concluded that they need the ability to analyse as much or as little of a description as they need, and to reach that faceting decision as speedily as possible - selecting the most important entities that require linking in any body of text, and fields (just 'creator', 'institution' etc or terms within Scope and Content or Admin/Biographical?). The value of broader authority data was reiterated by the archivists - analysis should not be limited to Scope and Content. A fundamental point is that back-end Linked Data enhancement works best when it works with the grain of professional practice - pragmatically and speedily.

The OMP approach is innovative in that it offers further exposure of data - and all AIM25 data has been processed as part of the project. Sustainability will be maintained going forward either by periodic manual data dumps into OpenCalais or by automated calendared refreshes - the same approach could be envisaged for LD UKAT as a national service plugged into local systems such as CALM. Improving the OpenCalais vocabularly by importing archive-specific terms is crucial to the success of mark-up. Analysis of the catalogue data is only valuable if OpenCalais learns from archivists. Until this happens, the breadth of vocabulary will limit the scope of the mark-up. It is also worth putting pressure on the main suppliers of archival cataloguing software to encourage them to embed support in periodic upgrades.

Experimentation with NRA data is ongoing - this will test how difficult it would be to build an authorities service off the NRA/ARCHON. The results will be described in a separate blog post.