Friday 23 March 2012

UKAD Conference

I gave a talk with Robert Baxter from Cumbria Archive Service at the annual UKAD conference at The National Archives on 21 March. The talk explored the potential of Linked Data in the archives, libraries and museums sector, focusing on the experience of Step change.

Key lessons/challenges from Step change and from other JISC projects King's College Archives are working on (Trenches to Triples and World War One Research) mentioned in the talk include:

  • Definining/setting up/maintaining APIs - this is potentially challenging and time-consuming
  • Need for URI definitions/syntax across the archives, libraries and museums sector - this discussion was started by LOCAH and is ongoing. A wiki will be launched soon by ULCC inviting information professional feedback on these definitions and to try and reach some consensus in the coming months
  • Place name vocabularies are a particular challenge. Step change archivists will potentially have access to some four or five sets of data about similar places - for example Geonames, English Place Names, AIM25-UKAT, GoGeo, and a local CALM place dataset. Have will they ensure consistency or that terms found across the datasets are actually talking about the same place?
  • Linked Data analysis exposes poor quality and inconsistent existing metadata. Step change is partly about providing tools that will identify discrepancies and make metadata input more consistent but the funding and management challenges of this laundry operation remain considerable
  • Establishing and supporting new live LOD services beyond existing JISC funding will be a challenge. Services go down - how will data retrieval cope with this fact of life?
  • Visualisation - this poses multiple challenges. how much information is too much information for users? How do we maintain relevancy - can the users decide themselves to some extent?
The development of the Workflow tool (Alicat) and LOD version of UKAT are well under way (Workpackages 2-3). These are informing redesign currently under way at Axiell. A meeting is planned on 29 March to review CALM development to date, prior to the commencement of analysis of Cumbria test catalogues by Robert using the new tools in a CALM development environment.

Friday 2 March 2012

Progress report

The Step Change project is well under way and there is lots to report.

Rory at ULCC has been working hard on creating a SKOS version of AIM25-UKAT, rolled out as a service. The redesign of the workflow tool by which archvists can interrogate and analyse finding aids using semantic tools such as Open Calais, is well under way. Careful note was made of the fndings on the design and its usability by the professional survey panels convened to look at the provisional tool as part of the OMP project in 2011, but also professionals present at a meeting convened by Jane Stevenson at JISC on Linked Data and archives on 7 February, at which the first design of the tool was showcased to a wider audience by Rory. Feedback from that meeting included the need for better faceting of results, faster processing speeds, and more relevant choices available to archivists to validate the processed entities ('This is Winston Churchill, not the Churchill tank') .

The current redesign is aimed at producing a cleaner, streamlined tool for processing not only ISAD(G) records, but also more detailed and granular catalogue entries, down to single lines of image metadata, a refinement required as part of the related JISC-funded Trenches to Triples project that allows for semantic processing of digital asset management system metadata. A design meeting is scheduled with CALM for March to refine the adaptations to the CALM user interface necessary to incorporate the workflow tool. A working version of the tool and redesigned CALM system will be road tested at Cumbria and with London members of the CALM User Group once the initial design phase is completed. Chris Hilton of the CALM User Group is helping with this evaluation.

A data exchange schema has been drawn up by ULCC and CALM and a preliminary design document circulated to steering panel members. CALM backend and front end redesign work has begun.

Considerable progress has been made with Historypin to enable placenames held in AIM25-UKAT, and their corresponding collection descriptions to be displayed in a modified tab in Historypin corresponding to a broad neighbourhood such as a parish or similarly sized administrative unit. This should provide additional contextual AIM25 catalogue information to users of Historypin, and visa versa, once the service goes live ('Interested in these historical photographs? To learn more about parallel collections that may be of use, click this tab for archive/record office descriptions'). A similar read-across will be possible for the Cumbria instance of CALM, to demonstrate the value for both archives and Historypin of sharing data. Feedback from record office users (often a different audience from university archives) will determine the utility of this approach in the local setting.

This phase of the work posed a variety of challenges familiar to projects using geo-data. Latitude and longitude information needed to be generated from the placenames in order to utilise the Google maps API used by Historypin. Place name information in AIM25-UKAT was often too broad, or too specific, to be meaningful when translated into Historypin. For example a collection indexed with the term 'London' but actually concerning papers about Wandsworth would resolve to a point near Charing Cross in Google maps - misleading for Historypin users expecting to find related information in the Wandsworth part of the map. This example highlights the discrepacy between indexing granularity and the granularity necessary for adequate geo-location, and the specificity of indexing intended for collection level only (and necessarily and intentionally broader).

Historypin require very accurate scope and content information about a place and another problem to emerge was 'mixed' collection level scope and content descriptions containing references to papers about widely dispersed geographical locations. This is often the case with records reflecting lengthy and varied careers, including those of military officers posted around the globe or scientists on botanical or other expeditions. In these cases, each scope and content paragraph might read across and be pinned to sometimes wildly divergent parts of the globe. A user excited by the 'other useful information' tab for images pinned on Historypin to Oxford Street, say, might start reading a paragraph of catalogue information beginning with a description of an expedition to Borneo, and only later going on to describe Oxford Street. One possible solution for this problem is to allow archivists to highlight, select and save components of scope and content paragraphs and corresponding placenames in the index, so only the appropriate information - and only this information - is displayed in the 'other useful information' tab. This brings its own data complications, however, particularly of storage, retrieval and update of catalogue information. The project team is currently exploring work-arounds and solutions to these data accuracy problems.

Overlap with other projects

In January, JISC awarded a substantial grant to the DEEP project, based at the Department of Digital Humanities at King's College London but involving input from several universities. This project - Digital Exposure of English Place Names (http://www.jisc.ac.uk/media/documents/programmes/digitisation/econtent/econtent11_13/englishplacenamesprojectplan.pdf)- will publish a Linked Data version of the English Place-Name Society's corpus on a county by county basis, and generate a rich, historical hiearchy of names to complement services such as Vision of Britain. Step change is exploring the possibility of using relevant Cumbria/London place name data to enhance the accuracy of placename indexing via the workflow tool. An archivist would be able to interrogate the new database and select a more historically accurate and appropriate term for the catalogue entry they are working on, such as the title deeds of an individual property, and its accompanying uri. The archivist might also have a range of alternatives to draw on - a detailed and locally-specific placename list maintained in CALM, a Geonames alternative and the AIM25-UKAT placenames index, for example. Potential pitfalls here include the danger of inpappropriately mixed data points (the same places might be described in very different ways across the datasets, or the same name in two or more sets actually correlates to different places), the use of local variants and nicknames, not to mention licensing concerns for component placename lists. The user would also not necessarily notice any improvement in the front end catalogue site unless the places and their uris are actually connected to real services delivering some added functionality.

Step change has met with the JISC-funded M25 Search25 service, which is looking to create Linked Data bibliographic tools useful to London's research libraries. We explored possible avenues of collaboration involving mixing bibliographic information with archive data in London ('View these descriptions about Winston Churchill...view these book titles'). Discussion centred around using RDF versions of LCSH and Marc records, as demonstrated by the recent BL Talis project. Ideas include mixing contact/repository information such as ARCHON with library equivalents, and subject-specific read-across for sub-sets of books and archives. Discussion have taken place with TNA on data exchange and experimentation using TNA datasets.

Step change overlaps with Trenches to Triples (http://www.jisc.ac.uk/whatwedo/programmes/di_informationandlibraries/resourcediscovery/trenchestotriples.aspx) a new JISC funded project being managed by several members of the Step change team. T3, which runs until the end of July 2012, will include the adaptation of the Step change workflow tool to enable the analysis of detailed catalogue entries and the publication of the semantic output as RDF, the creation of an API for the catalogues of the Liddell Hart Centre for Military Archives and a link between First World War-related metadata and images from the JISC-funded Serving Soldier project, and catalogue entries to provide a granular read-across between different hearchical representations of the same collections: collection level-detailed file level-item/piece level from the image metadata. The project will also involve the creation of an enriched corpus of World War One terminology for insertion in UKAT and available across JISC's suite of Great War projects via the Step change AIM25-UKAT API. 

Broader discussions are under way between archive, library and museum professionals on a uri definition directory to enable cross-sectoral Linked Data data model and minimise duplication of effort.  A wiki will be created to capture any outcomes to developers and members of the wider LODLAM community, internationally.