Linked Data and Libraries: Creating a Linked Data version of the BNB

Neil Wilson from the BL doing this talk.

Government has been pushing to open up data for a while. This has started to change some expectations around publishing of ‘publicly owned’ data.

BL wanted to start to develop an Open Metadata Strategy. They wanted to:

Try and break away from library specific format and use more cross-domain XML based standards – but keep serving libraries not engaged in cutting edge stuff
Develop the new formats with communities using the metadata
Get some form of attribution while also adopting a licensing model appropriate to the widest re-use of the metadata
Adopt a multi-track approach to all this

So first steps were:
Develop capability to supple metadata using RDF/XML
Worked with variety of community and organisations etc…

Current status:
Created a new enquiry point for BL metadata issues
Signed up c400 orgs to the free MARC21 z39.50 service
Worked with JISC, Talis and other linked data implementers on technical challenges, standards and licensing issues
Begun to offer sets of RDF/XML to various projects etc.

Some of the differences between traditional library metadata and Linked data
Traditional library metadata uses a self contained proprietary document based model
Linked data more dynamic data based model to establish relationships between data

By migrating from traditional modles libraries could begin to:

Integrate their resources in the web
Increase visibiilty, reach new users
Offer users a richer resource discovery experience
Moving from niche costly specialist technologies and suppliers to more ‘standard’ and widely adopted approaches

BL wanted to offer data allowing useful experimentation and advancing discussion from theory to practice. BNB (British National Bibliography) has lots of advantages – general database of published output – not just ‘BL stuff’; reasonably consistent; good identifiers.

Wanted to undertake the work as extension of existing activities – wanted to develop local experitise, using standard hardware for conversion. Starting point was Library MARC21 data. Wanted to focus on data issues not building infrastructure and also on linking to stuff.

First challenge – how to migrate the metadata:
Staff training in linked data – modelling concepts and increased familiarisation with RDF and XML concepts
Experience working with JISC Open Bibliography Project and others
Feedback on MARC to XML conversion

Incremental approach adopted – with several interations around data and data model.

Wanted to palce library data in wider context and supplement or replace literal values in records. Linked to both library sites:
Dewey Info
LCSH SKOS
VIAF

but also non library sites:
GeoNames
…

Three main approaches:
Automatic Generation of URIs from elements in records (e.g. DDC)
Matching of text in records with linked data dumps – e.g. personal names to VIAF
Two stage crosswalking [? missed this]

Lots of preprocessing of MARC records before tackling the transform to RDFXML using XSLT

Can see the data model at http://www.bl.uk/bibliographic/pdfs/british_library_data_model_v1-00.pdf and more information – http://www.bl.uk/bibliographic/datafree.html

Next steps:
Staged release over coming months for books, serials, multi-parts
Monthly updates [I think?]
New data sets being thought about

Lessons learnt…
It is a new way of thinking – legacy data wasn’t designed for this purpose
There are many opinions out there, but few real certainties – your opinion may well be as valid as anyone else – especially when it’s your data
Don’t reinvetn the wheel – there are tools and experience you can use – start simple and develop in line with evolving staff expertise
Reality check by offering samples for feedback to wider groups
Be prepared for some technical criticism in addition to positive feedback and improve in response
Conversion inevitably identifiers hidden data issues – and creates new one
But better to release an imperfect something than a perfect nothing

There is a steep learning curve – but look for training opportunities for staff and develop skills; Cultivate a culture of enquiry and innovation among staff to widen perspectives on new possibilities

It’s never going to be perfect first time – we expect to make mistakes – have to make sure we learn from there and ensure that everyone benefits from the experience. So if anyone is thinking of undertaking a similar journey – Just do it!

Q: How much of the pipeline will you ‘open source’
A: Quite a few of the tools are ‘off the shelf’ (not clear if open source or not?). The BL written utilities could be released in theory – but would need work (not compiled with Open Source compilers at the moment) – so will be looked at…

Overdue Ideas

Ideas linking Libraries, Computing, E-learning, and anything else that springs to mind.

Linked Data and Libraries: Creating a Linked Data version of the BNB

Leave a Reply