TV Reviews have always (in my mind at least) been a bit of an oddity – if you watched the programme, what is there to tell you? And if you missed it, why are you interested? (although despite this I’ve always enjoyed reading TV reviews – especially Nancy Banks-Smith)
However, with the availability of TV ‘catchup’ services online – particularly the iPlayer – the TV review can not just be an amusing piece of writing, but actually help you decide whether the programme is worth watching on catchup. With this in mind, it struck me that it would be nice to link from reviews to the relevant catchup service.
The Guardian TV reviews were an obvious starting place for me for two reasons. Firstly I’m a big fan of the Guardian, and secondly their ‘Open Platform‘ experiment enables people like me to grab and republish their content (with appropriate attribution). The BBC iPlayer catchup service was also an obvious starting point, partly because it is the most popular catchup service in the UK, and again because the BBC already provide some level of structured access to their data, providing much of their programme information as ‘linked data’. So I decided to try to mashup the Guardian TV reviews with the BBC iPlayer service.
Unfortunately although the Guardian provide a lot of structured metadata with their articles via the Open Platform, the TV programmes mentioned are not part of the structured metadata – so I was left having to scrape the names of programmes from the title or body of the article somehow. Here I came across a discrepancy – in the RSS feed for the “Last night’s TV” within a review each programme name was surrounded by a <strong> tag – but this wasn’t the case in the version of the reviews on the Open Platform – they were missing this tag.
On the support forum for the Guardian Open Platform I got some really helpful (and prompt) responses from Matt Mcallister including:
The style formatting that you’re looking for isn’t available in the current version of the API. But we’ve had similar feedback from others and have included that feature request in our to-do list.
Because of this, I ended up using the RSS feeds to grab the programme names. The channel the programme was on always followed the programme name in brackets, so I was able to grab this reasonably easily at the same time using a regular expression:
m/<strong>(.*?)<\/strong>\s?\((.*?)\)/g
(I’m not a reg exp expert, so any better version of this welcome, but it does the job)
Because the content in the RSS feeds are intended for personal use only, I can’t republish content from here – but luckily the RSS feed includes an ‘item id’ which can be used to look up the content on the Open Platform – so I combine the programme names with text of the article and other information from the Open Platform – and I’ve got my list of programmes with the full-text of the reviews attached.
Now to mash up with the BBC content. The biggest problem is going from the programme name to the information the BBC can provide, which is identified by a unique ID. For example the URI for the series “Richard Hammond’s Invisible Worlds” is http://www.bbc.co.uk/programmes/b00rmrmm, but from the review all I get is the name of the programme as a text string. I started to play around with simply throwing the programme name at the BBC search engine, but then @moustaki (Yves Raimond) came to my rescue by letting me know you could simply construct a URL:
http://www.bbc.co.uk/programmes/title of your programme [with spaces removed]
and it would automatically try to match (generally exactly, but with some ‘added heuristics’). So I was able to construct a URL like this, and then from the response I grabbed the final destination page so:
http://www.bbc.co.uk/programmes/richardhammond’sinvisibleworlds
redirects to
http://www.bbc.co.uk/programmes/b00rmrmm
This doesn’t work in 100% of cases – the main problem I’ve come across is when the Guardian reviews a programme from a documentary strand (e.g. Time Shift or Storyville), they often just use the title of the episode, and omit the name of the strand. Unfortunately the BBC linking doesn’t pick this up – so for example:
http://www.bbc.co.uk/programmes/bread:aloafaffair
doesn’t pick up this Time Shift episode:
http://www.bbc.co.uk/programmes/b00rm508
Overall this approach gives a relatively good hit rate. At the moment if this is unsuccessful I just offer a link to a BBC search for the programme title – I could probably do some of this with some code to improve the match rate?
The next problem was how to get the iPlayer details for the programme. Luckily the BBC expose a lot of their programme data in a structured way. I had expected the iPlayer data to be available as RDF, as this is how the BBC has been exposing a lot of their data (there is lots written about this – see this Nodalities article for example) – but it looks like the iPlayer information is still on the edges of this – however, there is a standard way of retrieving iPlayer data which is documented on the BBC Backstage site. This allows you to construct a URI using a ‘groupID’ (that is an ID which represents the group which owns the programme – this is usually the ‘series’ ID) – so for the Richard Hammond series we can use the following URI:
http://www.bbc.co.uk/programmes/b00rmrmm/episodes/player
This returns some XML including the availability of the episodes on the iPlayer. I then integrate this with the Guardian data, and I have my final set of data that I’m ready to publish – a TV review, and links to episodes of programmes mentioned in that review on the BBC iPlayer service.
The next step was to publish this on the web somewhere. Now, this is where my skills fell down badly. Basically I’m not a designer, and making stuff look good is really not my forte. To quote my art teacher in a school report about my skills at art and handwriting:
Art: Tries hard with some success
Handwriting: Tries hard
So, my first attempt was as simple HTML as you could get, and (to put it bluntly) as ugly as sin (anyone who has looked at my ReadtoLearn app will know the score). I was left wondering how I could deliver something that looked nice, but didn’t require me to magically gain design skills, and I had a sudden inspiration: WordPress. The reason this site looks good (I hope) despite my lack of design skills is that I use WordPress, and one of the many, many ‘themes’ available – so I though if I could squeeze the data I had into a WordPress installation I’d have a nice looking web interface for free.
Some further thought and investigation and I realised that the easiest way (I could see) of achieving this was to publish my application as an RSS feed (no need to worry about the formatting – just the content), and then use one of the WordPress ‘syndication’ plugins to scoope up the RSS items and republish as WordPress blog posts. The use of WordPress syndication was something I originally picked up from Tony Hirst, who pointed at a blog post by Jim Groom describing exactly how to do it.
So, some tweaking of my code to output RSS (with most of the useful information in the description tag in basic HTML), a ‘one click’ install of WordPress on my website (I use Dreamhost as my web site host who offer a number of ‘one-click’ installs including WordPress), a little while experimenting with different themes to see which one I liked best for this particular app (I went with Boumatic by Allan Cole), and I had transformed by ugly html into something altogether more elegant and polished.
The Guardian Open Platform terms and conditions mean that you have to refresh your data at least every 24 hours (I’m guessing this is in case there are any corrections or take-down notices on any of their content), so I added another WordPress plugin which deletes posts automatically after 1 day, and then I had my “What to Watch” application ready to go. Not only that, but adding the WPTouch plugin means the site also works nicely on a range of handheld devices – no extra effort on my part.
There’s still some work to do, and I’ve got some ideas for improvements, but for now I’m pretty happy both with the mashup, and the way I’ve managed to publish it via WordPress – but as always suggestions for different approaches, or improvements to the app are very welcome. Have a look at What to Watch and tell me what you think 🙂
Nice blog post explaining the process, really easy to follow. Nice idea as well. And I didn’t realise wp could consume rss feeds, WP just goes up in my estimation.
Which did you code it in? perl?
Yes – I coded in Perl as I already had a lot of the relevant code from previous projects, and I’m relatively happy working with XML in Perl. The code isn’t that complicated, and I’m thinking of re-coding in Ruby as an exercise to help me learn some Ruby 🙂
I believe that the original plugins for WP to consume and republish RSS came out attempts to steal content by spammers (grab good content, republish with spammy links) – but for me this suddenly becomes a really powerful way to publish stuff.
One issue I meant to mention in the blog post but forgot. I’m not able to retrieve iPlayer information for programmes that don’t have a parent ‘series’. If you try getting the iPlayer information for a single episode it just fails. I’m told that some improvements are in the works and this should become possible in the near future.
Thanks Tony – yes I do remember seeing the OU/BBC progs when you published it (it is possible this was what triggered my idea somewhere along the line!). On Twitter you also mentioned the Games Reviews mashup you did from the Guardian Open Platform – I’d completely forgotten about this, but now you mention it, it comes back to me – very cool: http://ouseful.wordpress.com/2009/03/19/guardian-game-reviews-with-video-trailers/