Putting Warwickshire Libraries on the map

I was very pleased to see earlier this week that Warwickshire County Council had started to release data sets openly on the web. The data is released under a Creative Commons Attribution-Share alike license, and I guess builds on the central government data.gov.uk initiative. While at the moment there are only a few datasets, the blog promises more in the very near future. You can see the data that has been released so far at http://opendata.warwickshire.gov.uk/categories

Today I saw an announcement from their @wccopendata twitter feed that they had put an xml feed of Warwickshire libraries on the site. I took a quick look, and saw it included location coordinates I thought I’d do a really quick map mashup. This is entirely based on something I’d seen Tony Hirst do a while back which he blogged at http://arcadiamashups.blogspot.com/2009/11/open-library-training-materials-and.html.

The mashup uses Yahoo Pipes, and makes use of the fact that if you include location information in the right place in the Yahoo Pipes output it will automatically show a map of the results.

The first thing you need to do is get the xml data using the Pipes ‘Fetch Data’ module – this just needs the URL of the xml file:

I’ve also had to put the ‘path to item list’ – in this case if you look at the xml file you can see the structure is something like:

<libraries>
<library>
<… name, address, stuff …>
</library>
<library>
<… name of second library, address, stuff …>
</library>
</libraries>

Since the details of each library are within the ‘library’ tag, and I want each library to appear as an individual item in the list, this is what goes in the ‘Path to item list’.

An important aspect of using pipes is that getting the output to display as you want, you have to put the relevant information in specific fields (or give the relevant fields the right names). In this case, I want the library name to appear as the main heading in the output – which means it has to be in the ‘title’ field. In the original XML file, the name of the library is in a <name> tag, so this needs to be renamed – and pipes provide a function that does this:

There are two bits of ‘location’ information in the original feed – the address (including postcode) and some ‘coordinates’. I guess these are OS coordinates, but I haven’t really checked – luckily Pipes is cleverer than me, and has a way of automatically understanding some types of ‘location’ information. In this case I can just push the coordinates through a ‘location builder’:

The place you ‘assign results’ to is important – it is putting the location information in this field that makes the output automatically appear as a map. This was a trick copied directly from Tony’s pipe.

Finally I noticed there was a link to an image for each library in the xml, and thought it would be nice to include this in the output. I knew that this would need to go as an HTML Image tag in a ‘description’ field, so I used a loop and ‘string builder’ function to do this:

The first line and last line puts the image tag stuff in, and the item.image pulls the link from the XML file.

That’s it – the whole pipe:

If you look at the results of the pipe you can see that because all the data is in the right fields, the results automatically appear on a map:

I’m hoping that I might get a chance to play a bit more with this data – perhaps at the upcoming Mashed Library event ‘Liver and Mash’. If anyone is interested in a ‘take away’ exercise, you could try to do the same thing with the Warwickshire Museums and Art Galleries data 🙂

Something kind of OO

My first experience of programming was on a BBC B (for those old enough to remember), using BBC Basic. I didn’t do computing/computer science at school, but I was interested enough to move from typing in program listings from books and magazines to writing some basic programs myself – the one I remember particularly would tell you which day of the week any given date fell on.

Some years later my first ‘real world’ experience of any kind of programming was writing macros in WordPerfect – macros are a way of automating a set of commands or key strokes (often still very useful, I’d really recommend looking at tools like ApplescriptAutoHotkey and MacroExpress – they can really help simplify tasks, and sometimes deliver significant time savings). Most Macro languages also support some kind of ‘logic’ allowing you to only carry out parts of the macro when certain conditions are true.

After another gap, my next step on the ladder was using Perl. I initially picked this up because I was working with applications that were written in Perl, and as I started to use it, it felt very familiar – taking me back to my experience on the BBC. I also found the active community around Perl meant that when I hit problems, there was almost certainly help on hand. Dealing with XML especially I found there was a good tool set already available for me to pick up and start using.

By this point all the programming I’d done was procedural. In procedural programming you write a set of ‘procedures’ or ‘routines’ which are (to a large extent) self contained sets of instructions. As you go through a program you can ‘call’ a procedure whenever it is needed – once all the code in the procedure has run, the program picks up from the point you initially called the procedure to run.

It was probably around this time that I thought I ought to really apply myself to learning a programming language properly, and so I picked up a few books on C and C++, and started to read about an alternative to procedural programming – ‘Object Oriented Programming‘ (OOP). Although OOP had been around for a while, it was probably the mid 90s that it started to become widely used.

To be honest, I struggled. I couldn’t get my head around the OOP concept, and at the same time C and C++ were much more difficult to get to grips with than the languages I’d previously used. I didn’t get anywhere, eventually gave up, and stuck with Perl.

Although Perl is often used for procedural programming, it can also be used with an object-oriented (OO) approach and where I was working with code that others had written, I did sometimes use an OO approach – but really without understanding properly what I was doing, and relying on copying examples and practice from others.

As I started to do jobs that were less ‘hands on’, I hardly got time to do any programming, and it wasn’t until last year that I decided I’d find myself some ‘hobby’ projects I could do for fun. Having done a few of these (e.g. Read to Learn and What to Watch) in Perl, I thought it might be time to try something new again. Rather than heading back to C++ or Java, I decided I’d try to take (what I hoped would be) a smallish step – and was left choosing between two languages – Ruby and Python. Both had a reputation for being relatively easy to pick up, and also for enabling you to get stuff done quickly (I really liked this idea).

Having looked at both, and kicked their tyres, I eventually opted for Ruby. I didn’t have very strong feelings about which way to go, but my initial look suggested to me that I’d find Ruby easier – it looked a bit like Perl to me, whereas Python reminded me more of C (not that I’m shallow and go just by looks), a few people recommended it to me, and there was an active community – including people using it for library type stuff (Blacklight is written in Ruby). I hope that at some point I might have a closer look at Python – one thing that did appeal was the fact that the Google App Engine supports Python, making it possible to launch a Python based app without needing to host it on a server somewhere.

The other thing about Ruby is it is often described as ‘completely Object Oriented’ – I was never entirely clear what was meant by this, but as one of my aims was to get to grips with the concept of OOP, it seemed like this was a good place to start.

Having decided to go with Ruby I found a couple of online tutorials (http://tryruby.org/ lets you actually do some Ruby live online straight away, while Ruby in 20 minutes talks you through the basics) and worked my way through them to get the hang of the basics. I also invested in O’Reilly’s “The Ruby Programming Language” on my iPhone – at £2.99 (compared to an RRP for the print edition of £30.99, and currently on Amazon at £18.99!) I think this is really good value, and although I am limited to using it on the iPhone, in this case I’m generally using it like a reference work, and it’s quite nice to use alongside my laptop.

I’ve always found that the only way I really engage with a programming language is to try to use it in reality – tutorials are fine for basic familiarity, but I’m much happier when I’m trying to solve my own problems – and also doing a representative project means I focus on the parts of the language that are really useful to me. So having recently written What to Watch in Perl, I thought a nice easy exercise for me would be to rewrite it in Ruby – it’s only a couple of hundred lines of code but does several tasks I’m likely to do in other places such as retrieve data from web services in XML format and output RSS.

One of the first things I realised was that although Ruby is an OO language for a simple script such as the one I was doing it would be perfectly possibly to take a very procedural approach to programming using Ruby. The question of whether you use an OO or procedural approach is really about how you think about what you are doing, and how you model your data.

Up until this point I’d been familiar with two types of ‘data structure’ – ways of storing data within a program. These were Arrays and Hashes. Arrays are simple lists of things, whereas Hashes are lists of pairs – each pair consisting of a key and a value. The idea of a hash is that you can lookup a value for any given key.

Just for illustration, if you wanted to store a list of ISBNs in a program, you could do this as a an Array which would look pretty much as you’d expect – e.g. (9780671746728, 9780671742515, 9780517226957).

On the otherhand if you wanted to describe a book you might do this using a hash – looking something like:

{ author => Adams, Douglas, title => Dirk Gently’s Holistic Detective Agency, ISBN => 9780671746728 }

You can create more complex structures by mixing an matching these – for example you could have an array of hashes to represent a list of books with detailed metadata, and within this you might even have some of the hash values as arrays – e.g. to represent a list of authors. You can imagine that this can quickly get confusing!

The thing about this approach is that it is easy to evolve these structures as you go along – there is nothing to stop you adding a new ISBN to the list in the Array, or adding a new key/value pair to the hash – if you wanted to record the publisher for example. This is also a problem, as it means you can easily lose track of what you are storing where, or do nonsensical things (e.g. add a ‘director’ key to the hash which is meant to describe books rather than films).

Ruby (and other OO languages) don’t abandon the concepts of arrays and hashes – and Ruby supports both of these. However at the heart of an object-oriented approach is the idea of an ‘object’. The big realisation for me was that an Object both provided a new kind of data structure tied together with various ways of manipulating the data (there is a short paragraph on Wikipedia comparing procedural programming with OOP)

Where I would have previously (for example) used a hash to store the details of a book, I can now define a type of object called a ‘book’ and in that definition I can setup a set of properties that a book has – such as Author, Title, ISBN. This formalises something that would have been much more informal if I’d just used the approach of using a hash to store this information, as described above.

As well as having a data structure, Objects also have ‘methods’ – things that they can do. In practical terms a method is a (generally) self contained piece of code, that does something – not totally unlike a procedure as I described earlier. However ‘methods’ are linked specifically to objects – so you can restrict the types of thing you can do to an object by only defining the relevant methods.

The terminology around this can get a bit confusing – a quick summary:

  • Class – this is a ‘type of object’ – a general definition which says what properties and methods are linked to an object – so you might define a class of ‘book’
  • Object – a specific instance of a class – that is, if you had a ‘book’ class, any particular book would be described by an object
  • Method – an action tied to a class of object

Thinking of sensible examples is always difficult (for me) and I’m not sure the following stands up to closer scrutiny, but I hope it demonstrates the ideas ok. Lets say you have a library with books you can loan, and reference books that you can’t loan. In a procedural language you might achieve this by having a hash that stored the details of a book, perhaps including a ‘reference’ value – so you could store loanable and reference books like this:

{ author => Adams, Douglas, title => Dirk Gently’s Holistic Detective Agency, ISBN => 9780671746728, Reference => no }

{ author => Adams, Douglas, title => Long Dark Teatime of the Soul, ISBN => 9780330309554, Reference => yes }

You could then write a procedure that loaned the book by linking the hash describing the book to a description of a library patron. You’d then have to add in some kind of test to check the value of the ‘Reference’ key in any book hash before you ran the ‘loan’ procedure. If you forgot to run this check at any point, as to all other intents and purposes a loanable book and a reference book are the same, running the ‘loan’ procedure on a reference book would simply result in the reference book being loaned – there would be nothing else to stop  this happening.

If we look at an Object Oriented approach to this, instead of having a hash to store the information about each book, we would have ‘objects’ to do this. We could have one type of object (class) for loanable books, and another for reference books. We wouldn’t need to have the extra ‘Reference’ value as in the hash above, because you could easily tell which was a reference book, because it would belong to a different Class of object. Additionally to this, because any ‘methods’ you can use are linked to the type of object (Class), you would simply define the ‘loan’ method (which would do a very similar thing to the ‘loan’ procedure above), which was linked to the ‘loanable book’ class only. You would then literally be unable to loan a ‘reference book’ type object – it would simply result in an error.

So taking an object oriented approach can really help in keeping control of your code, and help in making bugs more obvious and easier to trackdown (or avoid completely). There is an overview of OO thinking in this Ruby user’s guide which I think is useful, and I also found this OOP tutorial really helpful (although the examples are in C++ and Java, rather than Ruby)

As I started to grapple with these issues I quickly realised that using objects formalises what you are doing much more, and makes you think a lot harder about what you are trying to do and how you are going to do it right at the start of a project. It also forces you to think through how you are modelling your data much more carefully. Going back to the previous example you probably don’t want to have two completely separate classes for loanable and reference books – they are both books after all, and will have a lot in common with each other – the only difference being you can loan one and not the other. OOP allows for this by supporting the idea of ‘classes’ and ‘subclasses’ – you can have a general a general class – lets say ‘book’ with the relevant properties and methods attached – but only properties and methods that would apply to any book – so not (in this example) the ‘loan’ method. You can then have two subclasses – the ‘loanable book’ and ‘reference book’ classes – which would ‘inherit’ all the properties and methods from the more general ‘book’ class. You would then add an additional method to the ‘loanable book’ class to enable it to be loaned – and obviously you would not add this to the ‘reference book’ class.

This approach forces you to think through exactly what things you might want to do different classes of object right from the start, and what properties you need any particular object to have. I found it made me think more ‘abstractly’ about the types of data I was dealing with. For example, if you were looking at library data, you might start thinking about books, and define classes as I’ve just described. However, then you realise that you also have DVDs you want to loan out – and that while DVDs share somethings in common with books (they have titles, they can be loaned), they also have a number of different traits (they have directors) s0 you might start to think about modelling your library a more abstract class (e.g. ‘Library Item’) which might setup some basic properties and methods (e.g. they have a title, they can be added to a location), and then have more definite classes for ‘books’, ‘DVDs’ and if there was a new item type added to your stock at a later date (e.g. ‘journals’) you could add a new subclass as appropriate.

This type of modelling is hard work! Even with a relatively simple task I found thinking about this took up a lot of time – and really needed to be done before I could make much of a start on actually coding. This problem of modelling brings me back to my recent post What’s so hard about Linked Data – how data is structured, and how it behaves, is at the heart of this – no matter whether you do this in software or in data schemas.

Finally, where next with my programming journey? I really enjoyed starting to learn Ruby, and I think I’ll try to do some more work with it – I’m especially interested in looking at ‘Rails‘ which is a ‘web application framework‘ – designed to make it easier to develop web applications quickly and easily (and has another new (for me) concept for me to get my head round – MVC – Model-View-Controller)

Discussion panel

Panel is:

  • Peter McDonald (PM)
  • Marianne Talbot (MT)
  • David Robertson (DR)
  • Andy Lane (AL)
  • Fred Mednick (FM)

Q: Did you have hopes/thoughts about reuse when you did a podcast?

MT: No! But since been approached by a US company interested in doing some lectures for them – so definitely a way of self promotion

DR: Hoped to see use on iTunesU to compare with high quality content from US sites

PM: Public good – I’m publicly funded, and feel it is right to do it

Q: Business models related to Open Content? (e.g. micropayments)

AL: No different to other industries – e.g. Music industry – ‘freely available’ content (whether legal or not) but still need to generate income. May be option for micropayments for ‘value added’ – e.g. provided printed, bound, version of content. At the moment the OU is looking at how much they can afford to spend on the content and how it is classified – is it ‘outreach’, ‘marketing’, ‘recruitment’, ‘teaching’ etc? OER may provide cost savings – not just about income. If OER is an ‘add-on’ or ‘nice to have’ it will fail – has to be a central part of institution.

PM: Personal perspective.  Could be part of institutional model to get funding for research etc. – make an OER output a requirement on funding

DR: Would like this type of activity embedded more. e.g. it was discovered that certain ‘reading lists’ were available outside the university Intranet – some academics horrified – DR says he sees his lectures as the property of ‘the world’ (without wanting to be pretentious) – not just for those in Oxford

MT: Perhaps add a request for donations at the end of each podcasts [this makes me think if This American Life and Public radio in the US]

AL: No reason shouldn’t charge for some things – but understanding what people will pay for and who your audience is

Comment from Sarah from Strategic Content Alliance: Have to understand both your audience and your costs – need to get this clear before you think about a revenue model

Q: What is the single greatest challenge for OER?

FM: The plethora of organisations – would be nice if we had interoperable organisations!

AL: Will have succeeded when we stop talking about Open Educational Resources and start talking about Education – OERs are just a means to an end.

PM: How to change thinking. We end up ‘translating’ between the old way and new way of doing things – rather than changing the way we think to deal with the new way of doing things.

MT: Not about challenges – but a worry – will opportunities for new lecturers be curtailed as institutions reuse captured content instead – why have a new lecture when you can re-run an old one?

Q: Publishers don’t know if you use diagrams in lectures – but if you do it on camera you have to clear copyright. Can we get agreement from publishers for non-profit reuse?

AL: Very good point. Early days for publishers as much as it is for HEIs. Some work done – e.g. MIT have agreement with Elsevier that they can include up to 3 diagrams from Elsevier content in a piece of OCW (think I got this right).

Comment from Marion Manton (MM) MOSAIC project (Oxford reuse one, not Library data one): Teachers continually find stuff on the web which they use in their teaching – until OERs are part of this landscape we won’t have success. If you can’t find it via Google, most academics won’t find it – no good locked away in repositories. Are OERs really very different to using books, articles, etc.? Just because it’s a podcast, why should we think about this any differently?

DR: Quality and provenance problems with things on the open web

MM: Yes – but skills to assess quality and provenance of material doesn’t change – and these are skills we need to be fostering anyway

AL: This [OERs, reuse, Open Education] is going to take 10-20 years to shakeout – it isn’t going to happen quickly

FM: Nice story – rewarding attending ‘Learning Ambassador’  course in Nigeria by agreement with driving licensing organisation to issue (for small fee) a personalised licence plate – which made crossing borders etc. easier – there are ways of making this stuff sustainable.

OER and ICT for development

Tim Unwin asks – why are OERs not more widely used by people in Sub-Saharan Africa (excluding South Africa), when intuitively they would deliver huge value?

I’m afraid I missed documenting much of this talk. Tim challenged the OER model – it isn’t working (in this geographic area) – why not? Is OER essentially ‘imperialist’? Those involved are generally white, male, and older. Many OERs are not high quality – even flagship efforts like MIT OCW often very basic material available – e.g. just course outlines or basic powerpoint slides.

Biggest challenges:

  • Changes in personnel
  • Funding mechanism diversity
  • Time committments
  • Failure to understand ‘meanings’ – ICT4D (ICT for development) more than just computers in labs

Practical Realities

  • Structure and financing of African Universities – and now agendas around new private universities
  • Traditional didactic model of teaching – counter to particpatroy models
  • Role and ‘income’ of unversity teachers
  • Intellectual elitism – are African universities really serving their peoples’ development needs?
  • Dependant mentalities – ‘where is the next grant coming from?’
  • Limited human capacity – but some outstanding individuals
  • Dominance of individualism – idea that HE is about individual benefits and gain, not about community

Implications/Questions for ‘us’ (i.e Europe/US)

  • Fundamentl challenge of education as a public or private good
  • How much do we really use OERs in our own work?
  • Can we afford the time to help African academics achieve their ambitions?

OpenSpires

The OpenSpires project (http://openspires.oucs.ox.ac.uk) at Oxford is about making recordings of talks and lectures available for free in a sustainable way.

Now 280 recordings – approximately 160 hours – with over 130 academics contributing lectures and items. OpenSpires built on the success of using iTunesU (http://itunes.ox.ac.uk) to make podcasts available – over 1630 items, with >3 million downloads – licensed for personal use only – so not OERs in terms of institutional perspective.

Nice quote from a contributor noting with amazement that their lecture on philosophy being downloaded 18,000 times per week – my paraphrasing: “so I knew being ‘number one’ meant more than 20 downloads a week, but I’d no idea beyond that”

They’ve supported a ‘devolved’ model for contributions – departments can provide audio/video recordings to the central service – who can deal with legal stuff etc. Then the central service can ‘gap fill’.

Creative Commons gave a way of licensing material.

Benefits to the institution:

  • Accessibility
  • Outreach
  • Use of technology that reflects what is unique about Oxford
  • High calibre material of global importance
  • Fits with institutional strategic mission

Tried to make sure that the amount of extra time needed from academic/lecturer is minimal – shouldn’t be more effort than giving the talk in the first place.

Syndication using RSS – makes it very easy to distribute and enables reuse. (potential) types of reuse:

  • Website widge
  • Institutional portal
  • National portal
  • VLE/CLE
  • Subject centres

Communities add value – e.g. translating content into different languages.

Now getting academics to share experience – interesting to note the experience is about individuals appreciating it – fanmail etc. – not other institutions/academics using it? Does this matter?

One academic suggested a change to iTunesU contract – and got it accepted – the part in brackets below:

2.1 The Content. University hereby grants to Apple a nonexclusive, royalty-free right and license to use, reproduce, modify the format and display of Content (not the substance of any Content) …

He says – read contracts before you sign them, and make amendments if necessary! (parallels to the need for academics to look at the rights they sign away to publishers of research)

Q & A and comments from floor:

Q: What about institutional reuse as opposed to individual consumption – and also use of non-commercial for licensing

A: Early days – proved interest, excited to see how others may join together content into ‘courses’. Despite licensing people don’t really seem to have yet realised that the licenses really really mean that you can use this stuff!

Comment: Sustainability will only come as we change our attitudes towards teaching and value it as it should be.

Comment: In medicine a lot of content can’t be published as patients involved and they are happy for material to be used in medical education but not more generally.

Comment: Making available as a podcast allows students to ‘timeshift’ lectures – some worry that this will lead to students not coming to lectures (although commenter not convinced this is a problem)

Giving Knowledge for Free

Jan Hylen (previously at the OECD) presenting via video link for this session.

Despite a trend of growing competition where learning resources are often considered as key intellectual property, there is still much sharing of content between academics and institutions. There seems to be a new culture of openness in HE – Open Source Software, Open Access, Open Educational Resources – content made available over the internet for free and licensed for reuse.

OECD/CERI study setup to look at 4 main issues:

  • IPR issures
  • How to develop sustainable business models
  • Incentives and barriers to produce, use and delivery of open resources
  • How to improve access to and usefulness of resources

Firstly a definition – what is an OER?

OER are digitized material offered freely and openly for educators, students and self-learners to use and re-use for teaching, learning and research (UNESCO 2002)

Four areas of development driving OER:

  • Technological (improved access, better software)
  • Social (increased IT skills, expectations of ‘free’)
  • Economical (lower costs, new business models)
  • Legal (new licensing – rethinking IP)

Mapping OER movement is challenging – it’s a global movement with a growing number of initiatives and resources. Also remove barriers to access, OER initiatives tend not to require registration – and so poor usage statistics.

Different types of initiatives:

  • Publicly or institutionally backed programmes – e.g. OpenLearn, OpenSpires, Open Courseward (MIT)
  • Community approach – Open Course, Common Content, Free Curricula Center
  • Mixed models – MERLOT, Connexions, ARIADNE

A followup study in 2008 found that the number of resources in 6 major OER initiatives had increased between 30% and 300%; still a large amount in English, but more in other languages; a move from text content to audio-visual and multimedia content (podcasts, video etc.)

A move from the community approach to institutionally supported approach – most initiatives now have institutional support.

According to MIT and Tufts users of OpenCourseWare typically well educated – already holding a degree. Mostly North American based (although this may have changed since) and self-learners (i.e. not use in other institutions)

Teachers asked said they tended to use OERs are a supplement to other materials – generally as smaller chunks. Barriers to using OERs were lack of time, skills and reward systems.

Motivations for producing and sharing OERs:

Governments

  • Expaned access to learning
  • Bridge gap between informal and formal learning
  • Promote lifelong learning

Instituitons

  • Altruism
  • Leverage on taxpayers money
  • “What you give you recieve back improved”
  • Good PR and shop window
  • Growing competition – new cost recovery models needed
  • Stimulat internal improvement, innovation and reuse

Individuals

  • Altruistic or community supportive reasons
  • Personal non-monetary gain – ego-boost
  • Commercial reasons
  • It is not worth the effort to keep the resource closed

OECD report “Giving Knowledge for Free

During Q & A Andy Lane makes the point that you get waves of interest in specific areas – e.g. Darwin bicentenary – but this interest drops off quickly.

Content, Collaboration and Innovation

Today I’m at the Beyond Borders event in Oxford, in the very nicely equipped Said Business School. After a welcome from Melissa Highton, first up is Andy Lane talking about ‘OpenLearn‘ at the Open University.

Andy first asks ‘why make educational resources open’? There was a growing momentum behind OER worldwide (led by MIT) and the emergence of creative commons licenses made it possible to clearly state how materials could be used/reused. The idea of Open Educational Resources fitted well with the OU’s committment to social justice and widening participation – as well as the opportunity to build markets and reputation.

It was hoped that OERs might bridge the divide between formal and informal learning. It costs a lot to create good content – so any opportunity to reuse content and allow more time to be spent in areas where more value could be added – e.g. personal support.

Openlearn is in the process of moving to more ‘short form’ content – bringing in content previously hosted on open2.net. This short form content might be delivered via a number of routes – YouTube, iTunes, etc. At the same time there will be long form content for both learners (in the ‘Learning Space‘) and for educators (‘LabSpace‘). This will be complimented by OLNet – focused on Researchers.

LearningSpace (long form content) is delivered using the Moodle VLE. Not just a way of delivering open resources, but also somewhere that some experimentation can take place in terms of content format, content creation tools, delivery methods etc – some of which will feedback into the OU’s core VLE product.

OU believes this approach helps bridge informal and formal learning – the learner comes first, content is the hook, and delivers flexibility with a mix and match approach and self pacing. Only about 126,000 people registered – many fewer than the number of people who are browsing the site.

It is a huge challenge to understand how people are using the material. Example of Daniel Conn from the Times. On Open Learn seeing both ‘volunteer students’ and ‘social learners’.

Andy now talking about LabSpace – examples of teachers collaborating on aspects of creating educational resources – e.g.:

  • Preparation
  • Curriculum extension
  • Professional development
  • Share materials

Example of pushing learning content into a WordPress Blog (example of course on Hume – more information on how this was done at http://jimgroom.umwblogs.org/2008/02/17/proud-spammer-of-open-university-courses/, and thoughts from Tony Hirst at http://ouseful.open.ac.uk/blogarchive/013251.html)

Q & A:

Q: What kind of pressure is there to show link between publishing OERs and showing it brings in students to the Open University. What evidence is there?

A: Yes – those questions have been asked. It was an institutional action research project with buy-in from the top and external funding. Benefits not just in terms of how many students come in through this process – but many other aspects – use in Widening participation strategy – a way of dealing with hard to reach groups and bringing them in; being used by marketing department; being used as part of student registration process; used to work with regional funding bodies (in Scotland and Wales). Andy stresses all aspects need to be considered when looking at benefits

What’s so hard about Linked Data?

My post on Linked Data from a couple of weeks ago got some good comments and definitely helped me in exploring my own understanding of the area. The 4 Principles of Linked Data as laid out by Tim Berners-Lee seem relatively straightforward, and although there are some things that you need to get your head around (some terminology, some concepts) the basic principles don’t seem that difficult.

So what is difficult about Linked Data (and what is not)?

Data Modelling

Data Modelling is “a method used to define and analyze data requirements needed to support the business processes of an organization“. The problem is that the real world is messy, and describing it in a way that can be manipulated by computers is always problematic.

Basically data modelling is difficult. It is probably true of any sector, but anyone working in libraries who has looked at how we represent bibliographic and related data, and library processes, in our systems will know it gets complicated extremely quickly. With library data you can easily get bogged down in philosophical questions (what is a book?, how do you represent an ‘idea’?).

This is not a problem unique to Linked Data – modelling is hard however you approach it, but my suspicion is that using a Linked Data approach brings these questions to the fore. I’m not entirely sure about this, but my guess is that if you store your data in a relational database, the model is much more in the software that you build on top of this than in the database structure. With Linked Data I think there is a tendency to try to build better models in the inherent data structure (because you can?), leaving less of the modelling decisions to the software implementation.

If I’m right about this, it means Linked Data forces you to think more carefully about the data model at a much earlier point in the process of designing and developing systems. It also means that anyone interacting with your Linked Data (consumers) need to not just understand your data, but also your model – which can be challenging.

I’d recommend having at a look at various presentations/articles/posts by those involved in implementing Linked Data for parts of the BBC website e.g this presentation on How the BBC make Websites from IWMW2009.

Also to see (or contribute to) the thought processes behind building a Linked Data model, have a look at this work in progress on modelling Science Museum data/collections by Mia Ridge.

Reuse

One of the concepts with Linked Data is that you don’t invent new identifiers, models and vocabularies if someone else has already done it. This sounds great, and is one of the promises that open Linked Data brings – if the BBC have already established an ‘identifier’ for the common Kingfisher species, then I shouldn’t need to do this again. Similarly if someone else has already established a Linked Data vocabulary for describing people, and I want to describe a person, I can simply use this existing vocabulary. More than this I can mix and match existing elements in new models – so if I want to describe books about wildlife, and their authors, I can use the BBC wildlife identifiers when I want to show a book is about a particular species, and I can use the FOAF vocabulary (linked above) to described the authors.

This all sounds great – and given that I’ve said modelling data is difficult, the idea that someone else may have done the hard work for you and you can just pick up their model sounds great. Unfortunately I think that reuse is actually much more difficult than it sounds.

First you’ve got to find the existing identifier/vocabulary, then you’ve got to decide if it does what you need it to do, and you may have to make some judgements about the provenance and longterm prospects for those things you are going to reuse. If you use the BBC URI for Kingfishers, are you sure they are talking about the same thing you are? If so, how much do you trust that URI to be there in a year? In 5 years? In 10 years? (my books are highly likely to be around for 10 years).

I would guess reuse will get easier as Linked Data becomes more established (assuming it does). The recently established Schemapedia looks like a good starting point for discovering existing vocabularies you may be able to reuse, while Sameas.org is a good place to find existing Linked Data identifiers. This is also an area where communities of practice are going to be very important. For libraries it isn’t too hard to imagine a collaborative effort to establish common Linked Data identifiers for authors (VIAF as Linked Data looks like it could well be a viable starting point for this)

RDF and SPARQL

In my previous post I question the mention of RDF and SPARQL as part of the Linked Data principles. However, I don’t particularly have an issue with RDF and SPARQL as such – however my perception is that others do. Recently Mike Ellis laid dow a challenge to the Linked Data community in which he says “How I should do this [publish linked data], and easily. If you need to use the word “ontology” or “triple” or make me understand the deepest horrors of RDF, consider your approach a failed approach”, which suggests that RDF is difficult, or at the least, complicated.

I’m not going to defend RDF as uncomplicated, but I do think it is an area of Linked Data that attracts some bad press, which is probably unwarranted. My argument is that RDF isn’t the difficult bit – it’s the data modelling that gets represented in RDF that is difficult. This is echoed by the comment in the Nodalities article from Tom Scott and Michael Smethurst from the BBC

The trick here isn’t the RDF mapping – it’s having a well thought through and well expressed domain model. And if you’re serious about building web sites that’s something you need anyway. Using this ontology we began to add RDF views to /programmes (e.g. www.bbc.co.uk/programmes/b00f91wz.rdf). Again the work needed was minimal.

So for those considering the Linked Data approach we’d say that 95% of the work is work you should be doing just to build for the (non-semantic) web. Get the fundamentals right and the leap to the Semantic Web is really more of a hop.

I do think that we are still lacking any close to decent consumer facing tools for interacting with RDF (although I’d be really happy to be shown some good examples). When I played around with an RDF representation of characters from Middlemarch I authored the RDF by hand, having failed to find an authoring tool I could use. I found a few more tools that were OK to use for visualising and exploring the data I created – but to be honest most of these seemed buggy or flaky in some way.

I have to admit that I haven’t got my head around SPARQL in any meaningful way yet, and I’m not convinced it deserves the prominence it seems to be currently getting in the Linked Data world. SPARQL is a language for querying RDF, and as such is clearly going to be an essential tool for those using and manipulating RDF. However, you could say the same about SQL (a language for querying data stored as tables with rows and columns) in relation to traditional databases – but most people neither know, nor case, about SQL.

Tony Hirst often mentions how widespread the use of spreadsheets to store tabular data is, and that this enables basic data manipulation to happen on the desktop. Many people are comfortable with representing sets of data as tables – and I suspect this embedded strongly in our culture. It may be we will see tools that start to bridge this divide – I was very very impressed by the demonstration videos of the Gridworks tool posted on the Freebase blog recently, and I’m really looking forward to playing with it when it is made publicly available.

Conclusion

I’m not sure I have a strong conclusion – sorry! What I am aware of is a shift in my thinking. I used to think the technical aspects of Linked Data were the hard bits – RDF, SPARQL, and a whole load of stuff I haven’t mentioned. While there is no doubt that these things are complicated, and complex, I now believe the really difficult bits are the modelling and reuse aspects. I also think that there is an overlap here with the areas where domain experts need to have an understanding of ‘computing’ concepts, and computing experts need to understand the domain – and this kind of crossover is always difficult.

What to Watch

TV Reviews have always (in my mind at least) been a bit of an oddity – if you watched the programme, what is there to tell you? And if you missed it, why are you interested? (although despite this I’ve always enjoyed reading TV reviews – especially Nancy Banks-Smith)

However, with the availability of TV ‘catchup’ services online – particularly the iPlayer – the TV review can not just be an amusing piece of writing, but actually help you decide whether the programme is worth watching on catchup. With this in mind, it struck me that it would be nice to link from reviews to the relevant catchup service.

The Guardian TV reviews were an obvious starting place for me for two reasons. Firstly I’m a big fan of the Guardian, and secondly their ‘Open Platform‘ experiment enables people like me to grab and republish their content (with appropriate attribution). The BBC iPlayer catchup service was also an obvious starting point, partly because it is the most popular catchup service in the UK, and again because the BBC already provide some level of structured access to their data, providing much of their programme information as ‘linked data’. So I decided to try to mashup the Guardian TV reviews with the BBC iPlayer service.

Unfortunately although the Guardian provide a lot of structured metadata with their articles via the Open Platform, the TV programmes mentioned are not part of the structured metadata – so I was left having to scrape the names of programmes from the title or body of the article somehow. Here I came across a discrepancy – in the RSS feed for the “Last night’s TV” within a review each programme name was surrounded by a &lt;strong&gt; tag – but this wasn’t the case in the version of the reviews on the Open Platform – they were missing this tag.

On the support forum for the Guardian Open Platform I got some really helpful (and prompt) responses from Matt Mcallister including:

The style formatting that you’re looking for isn’t available in the current version of the API.  But we’ve had similar feedback from others and have included that feature request in our to-do list.

Because of this, I ended up using the RSS feeds to grab the programme names. The channel the programme was on always followed the programme name in brackets, so I was able to grab this reasonably easily at the same time using a regular expression:

m/<strong>(.*?)<\/strong>\s?\((.*?)\)/g

(I’m not a reg exp expert, so any better version of this welcome, but it does the job)

Because the content in the RSS feeds are intended for personal use only, I can’t republish content from here – but luckily the RSS feed includes an ‘item id’ which can be used to look up the content on the Open Platform – so I combine the programme names with text of the article and other information from the Open Platform – and I’ve got my list of programmes with the full-text of the reviews attached.

Now to mash up with the BBC content. The biggest problem is going from the programme name to the information the BBC can provide, which is identified by a unique ID. For example the URI for the series “Richard Hammond’s Invisible Worlds” is http://www.bbc.co.uk/programmes/b00rmrmm, but from the review all I get is the name of the programme as a text string. I started to play around with simply throwing the programme name at the BBC search engine, but then @moustaki (Yves Raimond) came to my rescue by letting me know you could simply construct a URL:

http://www.bbc.co.uk/programmes/title of your programme [with spaces removed]

and it would automatically try to match (generally exactly, but with some ‘added heuristics’). So I was able to construct a URL like this, and then from the response I grabbed the final destination page so:

http://www.bbc.co.uk/programmes/richardhammond’sinvisibleworlds

redirects to

http://www.bbc.co.uk/programmes/b00rmrmm

This doesn’t work in 100% of cases – the main problem I’ve come across is when the Guardian reviews a programme from a documentary strand (e.g. Time Shift or Storyville), they often just use the title of the episode, and omit the name of the strand. Unfortunately the BBC linking doesn’t pick this up – so for example:

http://www.bbc.co.uk/programmes/bread:aloafaffair

doesn’t pick up this Time Shift episode:

http://www.bbc.co.uk/programmes/b00rm508

Overall this approach gives a relatively good hit rate. At the moment if this is unsuccessful I just offer a link to a BBC search for the programme title – I could probably do some of this with some code to improve the match rate?

The next problem was how to get the iPlayer details for the programme. Luckily the BBC expose a lot of their programme data in a structured way. I had expected the iPlayer data to be available as RDF, as this is how the BBC has been exposing a lot of their data (there is lots written about this – see this Nodalities article for example) – but it looks like the iPlayer information is still on the edges of this – however, there is a standard way of retrieving iPlayer data which is documented on the BBC Backstage site. This allows you to construct a URI using a ‘groupID’ (that is an ID which represents the group which owns the programme – this is usually the ‘series’ ID) – so for the Richard Hammond series we can use the following URI:

http://www.bbc.co.uk/programmes/b00rmrmm/episodes/player

This returns some XML including the availability of the episodes on the iPlayer. I then integrate this with the Guardian data, and I have my final set of data that I’m ready to publish – a TV review, and links to episodes of programmes mentioned in that review on the BBC iPlayer service.

The next step was to publish this on the web somewhere. Now, this is where my skills fell down badly. Basically I’m not a designer, and making stuff look good is really not my forte. To quote my art teacher in a school report about my skills at art and handwriting:

Art: Tries hard with some success

Handwriting: Tries hard

So, my first attempt was as simple HTML as you could get, and (to put it bluntly) as ugly as sin (anyone who has looked at my ReadtoLearn app will know the score). I was left wondering how I could deliver something that looked nice, but didn’t require me to magically gain design skills, and I had a sudden inspiration: WordPress. The reason this site looks good (I hope) despite my lack of design skills is that I use WordPress, and one of the many, many ‘themes’ available – so I though if I could squeeze the data I had into a WordPress installation I’d have a nice looking web interface for free.

Some further thought and investigation and I realised that the easiest way (I could see) of achieving this was to publish my application as an RSS feed (no need to worry about the formatting – just the content), and then use one of the WordPress ‘syndication’ plugins to scoope up the RSS items and republish as WordPress blog posts. The use of WordPress syndication was something I originally picked up from Tony Hirst, who pointed at a blog post by Jim Groom describing exactly how to do it.

So, some tweaking of my code to output RSS (with most of the useful information in the description tag in basic HTML), a ‘one click’ install of WordPress on my website (I use Dreamhost as my web site host who offer a number of ‘one-click’ installs including WordPress), a little while experimenting with different themes to see which one I liked best for this particular app (I went with Boumatic by Allan Cole), and I had transformed by ugly html into something altogether more elegant and polished.

The Guardian Open Platform terms and conditions mean that you have to refresh your data at least every 24 hours (I’m guessing this is in case there are any corrections or take-down notices on any of their content), so I added another WordPress plugin which deletes posts automatically after 1 day, and then I had my “What to Watch” application ready to go. Not only that, but adding the WPTouch plugin means the site also works nicely on a range of handheld devices – no extra effort on my part.

There’s still some work to do, and I’ve got some ideas for improvements, but for now I’m pretty happy both with the mashup, and the way I’ve managed to publish it via WordPress – but as always suggestions for different approaches, or improvements to the app are very welcome. Have a look at What to Watch and tell me what you think 🙂

Linked Data

Linked Data is getting a lot of press at the moment – perhaps most notably last week Gordon Brown (the UK Prime Minister) said:

Underpinning the digital transformation that we are likely to see over the coming decade is the creation of the next generation of the web – what is called the semantic web, or the web of linked data.

This statement was part of a speech at “Building Britain’s Digital Future” (#bbdf) (for more on the context of this statement, see David Flanders ‘eye witness’ account of the speech, and his thoughts)

Last week I attended a ‘Platform Open Day‘ at Talis, which was about Linked Data and related technologies, so I thought I’d try to get my thoughts in order. I may well have misunderstood bits and pieces here and there, but I’m pretty sure that the gist of what I’m saying here is right (and feel free to post comments or clarifications if I’ve got anything wrong).

I’m going to start with considering what Linked Data is…

The principles of Linked Data are stated by Tim Berners-Lee as:

  1. Use URIs as names for things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

What does this mean?

While most people are familiar with URLs, the concept of a URI is less well known. A URL is a resource locator – if you know the URL, you can locate the resource. A URI is a resource identifier – it simply identifies the resource. In fact, URLs are a special kind of URI – that is any URL is also a URI in that a URL both identifies and locates a resource. So – all URLs are also URIs, but not vice versa. You can read more about URIs on Wikipedia.

Further to this, an ‘HTTP URI’ is a URL as we are used to using on the web.

This means that the first two principles together basically say you should identify things using web addresses. This sounds reasonably straightforward. Unfortunately there is some quite tricky stuff hidden behind these straightforward principles, which basically come down to the fact that you have to be very careful and clear about what any particular http URI identifies.

For example this URI:

http://www.amazon.co.uk/Pride-Prejudice-Penguin-Classics-Austen/dp/0141439513/ref=sr_1_9?ie=UTF8&s=books&qid=1269423132&sr=8-9

Doesn’t identify (as you might expect) Pride and Prejudice, but rather identifies the Amazon web page that describes the Penguin Classics edition of Pride and Prejudice. This may seem like splitting hairs, but if you want to start to make statements about things using their identifiers it is very important. I might want to state that the author of Pride and Prejudice is Jane Austen. If I say:

http://www.amazon.co.uk/Pride-Prejudice-Penguin-Classics-Austen/dp/0141439513/ref=sr_1_9?ie=UTF8&s=books&qid=1269423132&sr=8-9 is authored by Jane Austen, then strictly I’m saying Jane Austen wrote the web page, rather than the book described by the web page.

Moving on to principle 3, things get a little more controversial. I’m going to break this down into two parts. Firstly “When someone looks up a URI, provide useful information”. Probably the key thing to note here is that when you identify things with an http uri (as per principles 1 and 2), you are often going to be identifying things that can’t be delivered online. If I identify a physical copy of a book (for example, my copy of Pride and Prejudice, sitting on my bookshelf), I can give it a http URI to identify it, but if you type that URI into a web browser, or in some other way try to ‘retrieve’ that URI, you aren’t going to get the physical item appear before you – so if you lookup that URI the third principle says that you should get some ‘useful information’ – for example, you might get a description of my copy of Pride and Prejudice. There are some technical implications of this, as I have to make sure that you get some useful information about the item (e.g a description), while still being clear that the URI identifies the physical item, rather than identifying the description of the physical item – but I’m not going to worry too much about this now.

The  second part of principle 3 is where we move into territory which tends to set off heated debate. This says “using the standards (RDF, SPARQL)”. Firstly it invokes ‘standards’, and secondly it lists two specific standards. I feel that the wording isn’t very helpful. It does make it clear that Linked Data is about doing things in a standardised way – this is clearly important, and yet also very difficult – as anyone who has worked with Bibliographic metadata will appreciate, achieving standards even across a relatively small and tight-knit community such as librarians is difficult enough – getting standardisation across larger, disparate, communities is very challenging indeed.

What I don’t think the principle makes very clear is what standards are being used – it lists two (RDF and SPARQL), but as far as I can tell most people would agree RDF is actually the key thing here, making this list of two misleading however you read it. I’m not going to describe RDF or SPARQL here, but may come back to them in future posts. In short RDF provides a structured way of making assertions about resources – there is a simple introduction my slideshare presentation on the Semantic Web. SPARQL is a language for querying RDF.

There is quite a bit of discussion about whether RDF is essential to ‘Linked Data’ including Andy Powell on eFoundations, Ian Davis on Internet Alchemy, and Paul Miller on Cloud of Data.

So finally, on to principle 4; “Include links to other URIs. so that they can discover more things.”. The first three principles are concerned with making your data linkable – i.e. making it possible for people to link to your data in meaningful ways. The fourth principle says you should link from your data to other things. For my example of representing my own copy of Pride and Pejudice, that could include linking to information about the book in a more general sense – rather than record the full details myself, I could (for example) link to an OpenLibrary record for the book. Supporting both inbound and outbound links is key to making a rich, interconnected, set of data, enabling the ‘browsing’ of data in the same way we currently ‘browse’ webpages.

I was originally intending to explore some of the arguments I’ve come across recently about ‘Linked Data’ – I especially wanted to tackle some of the issues raised by Mike Ellis in his ‘challenge’, but I think that this post is quite long enough already, so I’ll leave that for another time.