This session led by Chris Gutteridge from the School of Elecronics and Computer Science (ECS) of the University of Southampton.
ECS have just published all their public data as open linked data – and so Chris was able to share both his knowledge of how to publish linked data, why do it, and how not to do it. Chris gave a relatively gentle introduction to Linked Data and RDF. RDF is a pretty simple way of expressing information – at it’s heart is the idea of a ‘triple’ made up of a ‘subject’ a ‘predicate’ and an ‘object’ – typical example is for a book:
‘The Hobbit’ (subject) ‘has a creator’ (predicate) ‘J.R.R. Tolkien’ (object)
Chris described this as ‘really really simple’ – when I tweeted this I got a lot of sceptical responses, but I think Chris’s point was as a concept, the idea of an RDF is not complex. However, several responses on Twitter suggested that it maybe simple in theory, but when you start using it to describe stuff, it can quickly get complex.
Chris went on to describe the concept of ‘cool URIs‘ (Tim Berners-Lee said “A cool URI is one which does not change”), and how a RDF uses URIs to identify things – and how Linked Data principles say that when you follow a URI it should ‘resolve’ to some useful information. Chris mentioned a few of the issues – especially the difference between a resource and a document. Chris also mentioned another problem which is ‘blank nodes’. RDF can be visualised as a ‘graph’ – a diagram of connected dots, with each dot representing a subject or object, and the lines between the dots representing the predicate relationships. Sometimes when you design RDF, you can end up with ‘blank nodes’ (a.k.a. anonymous nodes or bnodes), which are essentially nodes in an RDF graph which are not identified by a URI and is not a literal.
Having described some of the basics of Linked Data, Chris got us to brainstorm information in our institutions that might be ready to be published openly – and shared his own brainstorm of this in the form of a mindmap. Chris described some of the thinking that had happened at Southampton, and also some of the mistakes they had made. He mentioned several useful resources (see link below) and also had some ‘take aways’:
- get your URIs right
- don’t use anonymous (blank) nodes unless you really have to
- start with easy stuff – incremental approach with eye on future. (Chris says N.B. RDFa is not easy stuff! Get to grips with RDF first, as this is the basis for RDFa anyway)
- aim to publish as RDF but don’t underestimate simpler data formats like CSV – these can enable you to get data published quickly in interim
There was quite a bit of content in this session, and while it covered some of the basics of RDF and Linked Data, and was definitely aimed at those not that experienced with RDF and Linked Data, it was, by it’s nature, quite a technical session with quite a few concepts to get your head round. However, I found it useful, and some worthwhile hints and tips to those looking at publishing institutional linked data – especially some of the things that Southampton have learnt the hard way – so we don’t have to.
Chris has helpfully published a page with all the relevant links from his talk, except the mindmap of institutional data that might be published. Chris has also helpfully blogged the Linked Data session.
One thought on “IWMW10: Linked Data”