The twelve days of ChrisMash

Just a little message for ChrisMash:

On the first day of ChrisMash my true love sent to me, some cake and some coffee.

On the second day of ChrisMash my true love sent to me, two APIs and some cake and some coffee.

On the third day of ChrisMash my true love sent to me, three Google Maps, two APIs and some cake and some coffee.

On the fourth day of ChrisMash my true love sent to me, four RPCs, three Google Maps, two APIs and some cake and some coffee.

On the fifth day of ChrisMash my true love sent to me, five Yahoo Pipes. Four RPCs, three Google Maps, two APIs and some cake and some coffee.

On the sixth day of ChrisMash my true love sent to me, six RSS feeds, five Yahoo Pipes. Four RPCs, three Google Maps, two APIs and some cake and some coffee.

On the seventh day of ChrisMash my true love sent to me, seven homemade badgers, six RSS feeds, five Yahoo Pipes. Four RPCs, three Google Maps, two APIs and some cake and some coffee.

On the eighth day of ChrisMash my true love sent to me, eight mashers mashing, seven homemade badgers, six RSS feeds, five Yahoo Pipes. Four RPCs, three Google Maps, two APIs and some cake and some coffee.

On the ninth day of ChrisMash my true love sent to me, nine SPARQL queries, eight mashers mashing, seven homemade badgers, six RSS feeds, five Yahoo Pipes. Four RPCs, three Google Maps, two APIs and some cake and some coffee.

On the tenth day of ChrisMash my true love sent to me, ten QR codes, nine SPARQL queries, eight mashers mashing, seven homemade badgers, six RSS feeds, five Yahoo Pipes. Four RPCs, three Google Maps, two APIs and some cake and some coffee.

On the eleventh day of ChrisMash my true love sent to me, eleven MARC records, ten QR codes, nine SPARQL queries, eight mashers mashing, seven homemade badgers, six RSS feeds, five Yahoo Pipes. Four RPCs, three Google Maps, two APIs and some cake and some coffee.

On the twelfth day of ChrisMash my true love sent to me, twelve fancy cocktails, eleven MARC records, ten QR codes, nine SPARQL queries, eight mashers mashing, seven homemade badgers, six RSS feeds, five Yahoo Pipes. Four RPCs, three Google Maps, two APIs and some cake and some coffee.

JISC Mobile Infrastructure programme

Today I’m at the programme kick-off meeting for the JISC Mobile Infrastructure. There are 5+1 projects funded in this strand, as detailed by the programme manager, Ben Showers, in this blog post.

I’m working with Evidence Base on the ‘+1’ project which is a support project looking to establish a ‘mobile library community’ and ways of supporting projects/libraries/people working in this area. The first step on this path is the m-libraries support website http://m-libraries.info where @joeyanne has posted some introductory material outlining what we mean by ‘mobile libraries’ in this context (this is about use of mobile devices by and for library services, rather than about ‘collections on wheels’).

Today is an opportunity to hear about some of the other projects and meet the people involved. However, the aim of our support project is not just to support the current projects, but to start a platform for a growing community.

Some quick introductions to the projects:

Phonebooth
Twitter: @jiscphonebooth
Lead: London School of Economics
Partners: Edina
Summary (from http://infteam.jiscinvolve.org/wp/2011/10/11/mobile-infrastructure-for-libraries-new-projects/): PhoneBooth will repurpose the Charles Booth Maps, Descriptive of London Poverty and selected police notebooks, which record eye-witness descriptions of London street-by-street, for delivery to mobile devices. The project will enhance the current online delivery by enabling content to be delivered directly to the location to which it refers.

Introducing this is Ed Fay (@digitalfay). Existing online resource – the Charles Booth maps (http://booth.lse.ac.uk/) which includes maps, classification and notebooks, at a street by street level. They will keep the backend infrastructure, but put a new mobile client interface on it.

Accessing data at a street level is something that already occurs in the teaching of a specific course – but lots of paper based use at the moment. The mobile delivery fits really well into this teaching. Also expect interest from schools, genealogists etc. Also talked to staff at the Museum of London (who hold some of the Booth maps)

Ed stressed that focus is delivery of library content on mobile – not delivering a ‘teaching app’

Going to be an open web app – more sustainable

‘Support of new mobile devices’ is written into new LSE library strategy.

M-Biblio
Hashtag: #mbiblio
Lead: University of Bristol
Summary (from http://infteam.jiscinvolve.org/wp/2011/10/11/mobile-infrastructure-for-libraries-new-projects/): The project will enhance the learning and research activities of the University of Bristol’s academic community by developing a mobile application that can record and organise references to books, journals and other resources. These references can be added actively by scanning barcodes and QR codes, or passively by automatically recording RFID tags in items being used for study and research.

Mike Jones (http://www.bris.ac.uk/ilrt/people/mike-a-jones/overview.html) introducing M-Biblio. Want to develop a mobile application – and with permission collect user activity data

Hope that the library gets useful data – for resources that might not usually be borrowed – like journals, theses, and other “reference only” resources
The staff and students get useful tool
Can trail ‘near field communication’ (NFC) capabilities of newer Android phones – to read RFID tags in books (wonder if this is compatible)
Maybe other technologies …

Will use a web service as a ‘broker’ which will connect between phone clients and bibliographic sources (e.g. their library catalogue – Aleph) and stats collection.

Employing two User Experience and User Interface Design experts to help with those aspects, and engaging users – staff and students – in the process.

MACON
Hashtag: #oumacon
Lead: The Open University
Partners: EBSCO
Summary (from http://infteam.jiscinvolve.org/wp/2011/10/11/mobile-infrastructure-for-libraries-new-projects/): MACON will address challenges involved in delivering quality academic content to mobile devices in a seamless and user-friendly manner. The project will work with EBSCO, a major content and systems provider in order to prototype a mobile friendly resource discovery interface which will discover and expose quality academic content from both third party & local collections.

Introduced by Keren Mills from the OU. Aim to create a mobile interface to the EBSCO discovery tool… – and interested in whether people use it, and how they want to use it. Do they want to read on the mobile device? Do they want to bookmark or save stuff for later? Probably a mix.

Library users are not necessarily (or usually!) expert searchers – so need to look at how can take basic queries and still return useful & relevant results.

Already know that authentication can be an issue on mobile devices – often get bounced around authentication systems via redirects – and after a certain number of redirects the mobile browser can give up (more quickly than on a desktop). Thinking about ways of storing some local user information (via a bookmarklet?) to shortcut some of this and improve user experience.

Outcomes they are looking for:

  • Prototype discovery tool for delivery of academic content to mobile devices
  • Document detailed user requirements
  • Report on user feedback and usability, mapped to type of device used
  • Release the code through a suitable code repository

Want to also look at possible delivery of audio-visual content (locally held material)

Want to avoid native apps – users might well be starting from the web (e.g. via the Open University VLE etc.)

Also want to make sure that if the user clicks on a link to a publisher site, they aren’t suddenly in a non-mobile friendly interface – that is, use a publisher’s mobile interface where possible.

Two further projects have been funded that aren’t represented here today, both based at City University:

MoPED
Lead: City University, London
Summary (from http://infteam.jiscinvolve.org/wp/2011/10/11/mobile-infrastructure-for-libraries-new-projects/): The project will develop the MoPED system, which will combine mobile phone interaction with a public display in City University’s Main Library. The aim of the project is to investigate how to encourage the adoption of mobile services through a two-fold strategy: first, a strong, user-centred design process, commencing with an investigation of which mobile services are most likely to be beneficial; second, using an in-situ public display to promote (and assist getting access to) the library’s mobile services and to connect online services to the space of the library itself.

Learnmore
Lead: City University, London
The project will develop the Learnmore Mobile Application using a user-centred design process. Building on the current ‘desktop’ Learnmore content, the interface and content will be tailored to the actual needs of students using mobile devices, with considerations including the preferred media, topic and content size for mobile consumption.


Does size matter?
Some discussions of ‘what is mobile’ – the OU is thinking of devices with smaller than 10″ screens – as with iPads and similar devices the desktop interfaces work OK.

The PhoneBooth app is just not going to be usable below a certain size…

Suspect that the question of what aspects of a service or device that make it specifically ‘mobile’ is something that will come up again…

Openly Connect

“Openly Connect” was the title of a talk I gave at Internet Library International 2011 (tipping my hat slightly to Only Connect, the BBC4 quiz show). I’ve been wondering about the best way of sharing the presentation online, and decided that really blogging the ideas is much more useful than just dumping the slides somewhere.

I believe that libraries, museums and archives are not getting the most out of the data about their collections, because they aren’t publishing in ways that enable or encourage others to take the data and use it in new, innovative (or even boring), ways. I think we need to offer data more ‘openly’.

Being open

More mixed messages
(Image courtesy of withassociates, CC-BY-SA)

But what does ‘open’ mean in this context? For me, this is not a simple binary open/closed… but rather a continuum. There are a range of factors that affect whether others can easily take, and reuse, your data. But it is easy to focus on a single factor when talking about ‘openess’ – especially to focus on ‘rights’ to reuse data – copyright, database rights, licensing, terms and conditions etc. While these are an important factor, they are not the only factor.

Paul Walk puts it better than me in this slidedeck when he argues we need a ‘richer understanding of openness’ which encompasses not just permissive licensing but, more broadly, the ease with which data can be used, taking into consideration aspects such as format and access mechanisms

Friction

I’ve started to think about factors affecting reuse as being causes of friction (an idea I’m pretty sure I got from Tony Hirst). This may not be an exhaustive list, but the things I can see that create friction in the reuse of data are:

  • Explicit restrictions on reuse
  • Uncertainty about possible restrictions on reuse
  • Unusual or unfamiliar interfaces and formats (if you don’t work in the library world, you’ve probably never heard of Z39.50, and yet this is a standard machine to machine interface supported by many library systems)
  • Lack of information on data and where the data is available

Sometimes you might deliberately introduce friction – perhaps you don’t want your data to be reused by just anyone, for any purpose. I don’t see friction as bad per se – we just need to be aware of it, and especially avoid introducing friction when we don’t mean to.

Oiling the wheels

There are clear steps that a library, archive or museum can take to ensure there is no unwanted ‘friction’ in the reuse of their data.

1. Apply clear licensing or terms on reuse.

As a signatory of the Discovery Open Metadata Principles, I believe descriptive metadata, such as that in library catalogue records, should be licensed as ‘public domain’ data (using CC0 or ODC-PDDL or equivalent).

However, if reuse is restricted for some reason, be clear about what those restrictions are. Commercial services like Twitter offer clear terms of use on their APIs – these are restrictive, but clear. Similary Wired magazine’s recent decision to offer images under Creative Commons BY-NC, while falling short of ‘open’ offers some level of clarity. In the latter case, the use of the ‘NC’ (Non-commercial) clause can lead to uncertainty about rights for reuse – as noted in this article.

The JISC Guide to Open Bibliographic Data might help inform decisions about licensing metadata, as may the Discovery licensing guide.

2. Adopt widely used (machine) interfaces and formats for data

While any access to machine readable data increases the opportunities for reuse, adopting widely used interfaces and formats – ones for which a wide range of code libraries and tools will be available, and which the development community will be familiar with. Currently this often boils down to offering an interface that delivers data in XML or JSON format over an http interface. Sometimes the term ‘RESTful API’ is used to describe this kind of interface, although it should be noted that in reality providing a RESTful interface is a bit more than just xml/json over http. This article tries to explain more specifically what REST is.

3. Document your APIs and your data

Whatever interfaces/APIs and data formats you support, leaving them undocumented immediately increases friction on reuse. Many of the systems libraries, museums and archives use provide some API, but these are very rarely clearly documented by the organisations using the systems. Without documentation, it’s a huge amount of work for a developer to work out how to interface with the system.

For example, my local public library uses the Aquabrowser interface to their catalogue, which supports a couple of APIs – but in order to use these I had to find out the details of the API from the University of Cambridge documentation, and then apply the details to the public library system. Even just pointing to documentation held elsewhere helps – and sends the message ‘we want you to use this API’ – and without this, the API will be left unused.

The data we deal with in libraries, museums and archives is specialist, and often confusing to those not familiar with the details – therefore not just documenting the APIs available, but also the data available via those APIs (this is also a reason to offer simple representations of data, as well as fuller, more complex, expressions as appropriate).

Finally, data needs to be ‘findable’ – how would a prospective user of your data know what data you have, and where to find an API for it? In Australia the Museum Metadata Exchange is an interesting model for making this information available, but there are also more general tools/sites like like http://thedatahub.org/ and http://getthedata.org/.

4. Use common identifiers

This probably seems less fundamental than the points above,  for me it is absolutely key. The point here is that if anyone wants to combine data together, common identifiers across data sets are what they will be looking for – and I’d argue this is going to be a pretty common use case for your data, or anyone elses, by a third party developer.

While it is possible to write code that tries to match strings like “Austen, Jane” in your data to http://viaf.org/viaf/102333412/, this is much more effort and much less precise than if a shared identifier was used from the start. It’s no surprise that if you look at many mashups created using bibliographic data they rely on the ISBN to match across different data sources (for example, pulling in cover images from Amazon, LibraryThing, Google Books or Open Library).

Supporting Discovery

Much of my thinking in this area has been informed by my work with the ‘Resource Discovery Taskforce‘ and with the Discovery initiativethat followed the work of the taskforce. Discovery is an initiative to improve resource discovery by establishing a clear set of principles and practices for the publication and aggregation of open, reusable, metadata. So far Discovery has published a set of Open Metadata Principles, and a set of draft Technical Principles, as well as running several events and a developer competition.

There will be a lot more coming out of the Discovery initiative over the next few months, and you can follow these via the Discovery Blog (which I occaisionally write for).

Outcomes of Open

Examples

Rufus Pollock, the Director of the Open Knowledge Foundation, said “The coolest thing to do with your data will be thought of by someone else” – but is this true? Perhaps obviously, it isn’t a given that anything will happen when you publish your data for reuse. However, there are now plenty of examples of interesting applications being built on data that has been published with reuse in mind. To just pick a few examples:

This iPhone app to search Cambridge University Library was developed by a postgraduate student – just because they wanted to learn how to develop an app using JSON, and found the API documentation published by the library.

This app allows the user to take a picture of a work of art using their smartphone, and then retrieves information about the item from Europeana – it was built as part of a ‘hackday’ for Europeana.

This novel interface to pictures from the National Archive was built as part of the Discovery Developer competition.

This map brings together information from English Heritage and the British National Bibliography to display location specific information.

… and finally to blow my own trumpet, this bookmarklet I’ve already written about

Supporting developments

Something I don’t feel I really understand yet is how data suppliers can best engage with developers who might build on their data. Emma Mulqueeny (@hubmum) has written eloquently about engaging developers, but I’m still not sure I fully understand the best way that an organisation such as a museum, library or archive can engage with the development community.

Except the Cambridge University Library iPhone app, all the examples above are the results of some explicit stimulus – a competition or hackday. I don’t think any of them can be described as ‘production level’ – they are, in general, proof of concept. If publishing data is going to result in sustainable developments, we need to consider how this is supported – should organisations ‘adopt’ applications or developers? Should they work with relevant organisations to realise some commercial benefit to the developers? Are there other approaches?

I’d say at the least provide somewhere for developers, and potential developers, to talk to you, ask you questions, get permission to try stuff out – that dialogue is at least the first step to something more sustainable.

Take action

After my presentation at ILI 2011, which covered much of the same ground as this blog post, I felt that perhaps I’d missed a key point, and an opportunity while I had an audience – the question of what they should do in light of what I was saying. So, not wanting to make the same mistake again, I would encourage, even exhort, you to take the following actions:

  1. Explicitly license your data – whatever it is, put a license on it, be clear about what people can or can’t do with the data, and publish those details on your website
  2. Find out about, and document, any APIs you already have to your data – it might be z39.50, it might be SRU/SRW, it might be some RSS feeds – whatever it is, write a short page that says where the API/data can be accessed, some basic instructions on how to use it. Be clear what you expect from people interacting with your data (both in terms of licensing – point 1 – and anything else like “please don’t kill our servers”)
  3. Create a place for developers to communicate with you (or hang out somewhere that you can communicate them)

If you can’t do any of these things yourself, find out who can answer the questions, or make this happen – find out if they are interested, and if not, why not and what the barriers are (and then let me know!)

Overcoming information overload

The keynote this morning from Kevin Anderson (@kevglobal) and Suw Charman-Anderson (@suw) – journalists and technologists (http://charman-anderson.com/).

Kevin kicks off: Journalists and librarians dealing with many of the same issues – helping people navigate, interpret and understand information. Going to talk about some of the challenges in this area. First playing Xerox video on ‘information overload’ – http://www.youtube.com/watch?v=CXFEBbPIEOI

Eric Schmidt noted that we are now creating huge amounts of information (5 exabytes every 2 days is the quote, but see disagreement with this figure at http://www.readwriteweb.com/cloud/2011/02/are-we-really-creating-as-much.php)

Amount of time people spend on Facebook massively more than they spend on Newspaper web sites. Evidence that people are having problems moving to conclusions on complex stories – people move to simple narratives instead – Kevins says this equals “car crashes and celebreties”

Social media offers opportunity to re-engage people and help them navigate information.

We are moving from “mass” to “relevance” – e.g. not about how many followers you have on twitter, but about the relevance of what you post. Try to move from information overload (a ‘mass’ problem) and have filtered relevant information (a ‘relevance’ solution)

Social media provides a way of filtering information. But social media has to be ‘social’ – you need people at the heart of this.

Examples of crowdsourcing – Guardian analysis of MP expenses (http://mps-expenses.guardian.co.uk/), Ushahidi crowdsourcing crisis information (http://www.ushahidi.com/).

Kevin also mentions ‘entity extraction’ – uses Calais as an example..
Dewey D. – iphone app to manage ‘reading list’ (not in academic sense) and pulls in stories from the New York Times.

Poligraft – analyses funding of politicial campaigns – you can post URLs (of political stories) to Poligraft – it goes through and identifies politicians and organisations and shows you how politicians get campaign funding etc. Tells you about the major industries funding politicians etc – gives context to political story and help make sense of it.

We (journalists & librarians) have hundreds of years of doing things in a certain way – changing culture is incredibly difficult. If you have more than 5 people in the room, inertia hits …

Now Suw taking the floor… to talk crowdsourcing – breaking large tasks into smaller chunks that individuals can do. Suitable tasks – computational tasks and ‘human’ tasks.

Computational tasks = large datasets of computation that can be split into smaller datasets or computations – e.g. SETI@Home – this is about ‘spare cycles’ from individual’s computers they can contribute to computing power.

Human tasks = tasks that humans find easy but computers find difficult; brain driven; uses participants spare time; individual errors are average away by having the same task completed by many people.

Type of human tasks:

  • Recognising and describing things in images
  • Reading and transcribing writing
  • Applying expertise to identify, sort and catalogue
  • Collecting data
  • Manipulating models

Examples …

PCF oil paintings tagger – http://tagger.thepcf.org.uk/

  • Public catalogue foundation, BBC
  • Digitising pictures
  • Getting people to tag content with metadata – describe what is in the painting

“You don’t have to be an expert to take part”

Old Weather – http://www.oldweather.org/
Transcribing ships logs – contributes to historic data on climate, as well as other historical background

Ancient Lives – http://ancientlives.org/
Papyrus fragments – transcribe, measure, etc.

Multiple people doing each task gives you confidence when agreement across results

Herbaria@Home – http://herbariaunited.org/atHome/

What’s the Score – http://www.bodleian.ox.ac.uk/bodley/library/specialcollections/projects/whats-the-score
Digitised musical score collection from the Bodleian – will be starting crowdsourcing part of project soon

Why crowdsource?
Provide opportunities for education and knowledge maintenance
Most projects don’t require prior knowledge but people often enjoy learning more about a subject
Improve accessibility through addition of new metadata or improvement of existing metadata – create data for research
Even when digitised, collections are hard to search/comprehend

Galaxy Zoo shows public were as good, or better, than professionals at classifying galaxies
FoldIt found gamers could solve the structure of a protein that causes AIDs in rhesus monkeys in three weeks

Are your projects suitable?

  • Can the original material be digitised?
  • Can task be broken down into small chunks?
  • Can those chunks be done by humans or their computers?

It also helps if…

  • There is a benefit for the public – example of Google buying out a image tagging game, which then died
  • People feel part of a community
  • There are measurable goals and targets

Zooniverse are crowdsourcing gurus..
Citizen Science Alliance – “Science” doesn’t just mean science – looking for projects at the moment…
Events – e.g. Citizen Cyberscience Summit

Q & A:
Failure of crowdsourcing – NASA mapping craters on Mars – mid 80s. But failed to collect data in useful way.
In terms of issues around the data
Wikitorial – not enough community – hurdles to participation not a bad thing

Samarcande

Belgian ‘meta-union’ catalogue in Belgian. Were real problems with sharing metadata across regions – political interference meant that not all regions/libraries included.

Wanted a ‘next gen’ OPAC – various reasons:

  • Users like mouse, not keyboard
  • Surveys show satisfaction is higher than traditional OPAC … and
  • We ‘love’ New York public library’s OPAC

W3Line developed Samarcande. From technical point of view…

Challenges:

  • High volumes of bibliographic data coming from several origins
  • Create an intelligent database with FRBR scheme
  • Search functionalit: advanced search, facets, tags
  • Social network services (web 2.0)
  • Give internal and external services

Samarcande is a catalogue of catalogues – 7 partners

  • 6 union catalogues
  • Plus database of journal articl references
  • Variety of bibliographic description
  • Lack of shared rules or authorities (except for subject headings)
Totally impossible to do virtual search – have to aggregate records in an unique database.
  • Detect identical references – keeping local information
  • Keep the best of each reference (summaries, subject headings)
  • Keep all identifiers in order to propose retun links to original catalogue
  • Develop connectors to import and index the data
  • Get data from web services
  • Answer to SRU/SRW requests
Includes search functionalities:
  • Advanced search / autocompletion
  • Did you mean
  • Results by relevance
  • Facets
  • Tag cloud
  • Historic, reference basket, results by mail
  • Search profiles, bibliographies
FRBR
  • Gather editions of the same publication
  • based on author-title key as lack of any other identifier
  • Social network contents attaches at the ‘work’ level
Samarcande built on
  • mysql, jquery, php, solr
  • moccam for ILL

Meaning-based computing

This session by Terence Huwe

What is meaning-based computing? (MBC)

Importance of forecasting probability – ‘how should we modify our beliefs in the light of new information” – see “The Theory that would not Die” Sharon Bertsch McGrayne (http://www.librarything.com/work/11186931)

Based on Bayesian analysis.

What are the applications and potential uses of meaning based computing? Used for code breaking, handwriting and speech analysis etc. Approach commercialised by Michael Lynch – in the form of Autonomy (now acquired by Hewlett-Packard) – applied to ‘enterprise search’. 80% of a firm’s info assets are unstructured and thus hard to retrieve conventionally.

Two events furthered the growth of MBC – in 2007 the US federal rules of civil procedure made all data forms admissible for litigation – seen in the Enron case. The explosion in social media has created new challenges for firms – meaning they need to track huge amounts of unstructured information.

So – enterprise search is booming – MBC thrives in commercial and pure research settings. Autonomy’s MBC-based tools:

  • Implistic Query – hotkey to related information without leaving a primary task
  • Hyperlinking – live links, diverse sources
  • Smart or Active Folders
  • Automatic Taxonomy Generation
  • Sentiment Analysis
  • Automatic clustering of all data types
What is the impact on Information professions?
Starting to see some ‘seeping’ from enterprise search world:
  • “meaning based healthcare”
  • Universities use it at the enterprise level
  • Consulting
  • Telecommunications
One clear example of use in library domain is ACM use it for search of their digital publications.
Potential applications:
Turbo-charged meta-search
Effective search of unstructured data
Establish relationships between structured data (libraries etc.) and unstructured data
Taxonomy and MCB solutions might co-exist – why? Because MBC can manage social media categorization as an automated process. For this to happen, (library) developers need to get involved.
Pattern recognition is practiced at the reference desk – MBC proves that it is a high-level skill. More machine assistance going to be a good thing – we (information professionals) need to find a place at the table.
Forecasts:
  • Academic-based digital library developers may take an interest
  • Vendors might explore MBC as a meta-search tool
  • Repositories may get a boost
  • The practice of reference librarianship would benefit from this kind of tool
Conclusions
  • Need to be aware of MCB
  • Should analyse it’s, as yet unknown, potential for search and discovery within our digital libraries

Cheapskates guide to Linked Open Data

Rurik Greenall (@brinxmat) with a ‘Cheapskates guide to linked open data’. Using ‘Gunnerus Library’ special collections as an example. Wanted to remove the ‘fear experienced when faced by expert interfaces’ – want an interface that contextualises data.

Rurik says “if you’ve created a PDF, you’ve created RDF” – it’s baked in there by default. Rurik shows example from http://folk.ntnu.no/greenall/gunnerus/search/ – some is local data, but some dragged from other places across the web. Nice looking interface.

Rurik says Linked open data “Clawing back what remains of our professional dignity”

  • You have to learn about RDF – but it really isn’t that difficult
  • Tools of the trade – Google Spreadsheets; Google refine (esp. with DERI Galway RDF plugin)
  • Talis offer free hosting if your data is openly licensed
  • Tell all the people
  • Develop your application – you will need a programmer 🙂 but you’ve already modelled your data…
Q: What are the mature libraries for manipulating RDF?
A: Look at Sesame for Java; ARC2 for PHP

Surfacing the Academic Long Tail

Lisa Charnock and Andy Land from Mimas/John Rylands University (JRUL) respectively. JISC funded project ‘SALT’ (Surfacing the Academic Long Tail). JRUL had a lot of usage data. Hypothesis:

“Library circulation activity data can be used to support humanities research by surfacing the long tail …”

So essentially about developing ‘recommendation services’

Also wanted to look at possibility of developing and API-based national shared service.

Looked at work by Dave Pattern at Huddersfield which built recommendations into their OPAC. Wanted to build on the JISC MOSAIC project.

Market Research by MIMAS shows:

Seredipity still very important in terms of discovery
Increase in Anxiety for researchers – worried that they are ‘missing out’ on material that is out there but they aren’t finding
Trust concerns – who is making this recommendation, where does the data come from, why is this being recommended
Students tended to be sceptical of tagging and reviews, but saw potential benefit of recommendations in the style of Amazon (although again trust issues came up)

JRUL interested as different ways of surfacing content. The process for data was:

  • Loan transaction data extracted
  • Data anonymised and given to Mimas
  • Mimas processes data
  • API implemented in Capita Prism sandbox using JUICE framework
  • Additional processing performed on demand by API

API also been implemented in COPAC prototype interface.

Wanted to look at how real researchers found the process. Did two rounds of testing – first round found that they generally wouldn’t borrow the recommendations. However, when tweaked thresholds for recommendation, and ran the research again, found a complete swing to the other extreme, that most would borrow the things recommended – shows getting these thresholds right is key and subtle.

100% of those consulted would welcome a recommender function based on circulation records – even though they thought some of the recommendations were irrelevant…

What about a shared service? Some interest, but question of ‘why should we prioritise this’ from potential (library) partners – needs more work on the business case (find this baffling – speaks for itself for me but there you go…)

JRUL now going to test the recommender with subject librarians, and planning to go live either with local service, or national service if that gets off the ground). Will be making SALT recommendations alongside bX recommendations in new discovery interface at JRUL (Primo)

Thinking about allowing users to adjust thresholds to their own satisfaction, rather than dictating them.

Mimas want to:
Aggregate more data
Evaluate the longer-term impact on borrowing patterns at JRUL
Gather requirements/costs for a shared service
Investigate how activity data aggregations could be used to support collection development

See blog for more http://salt11.wordpress.com and also more on SALT and other activity data projects at http://www.activitydata.org

Q & A:
Q: Is software/data made available?
A: Yes – Juice extension on the juice site (couldn’t find it); Data has been released for others to use; other s/w and API will be released

Q: What about privacy issues?
A: Generally these projects have collected data at a high level – so can’t identify individuals;
A: Growing expectation that data will be made open – so need to consider this

Library Impact Data Project

Dave Pattern (@daveyp) and Bryony Ramsden (@libraryknitgirl) talking about the JISC funded Library Impact data project.

Wanted to look at how usage and non-usage of library resources affects degree outcome. Initially looked at University of Huddersfield data only. Examined visits to the library, and found it pretty equal no matter what the outcome (in terms of degree classification). However, when looking at book borrowing and e-resource usage, saw higher levels of use linked to higher level of achievement. Note, clear that there is correlation, but just looking at these stats doesn’t say anything about causation.

JISC funding gave opportunity to expand study across 7 more universities, and to look at the Huddersfield data in more depth.

When doing the study, had to make sure data protection issues were considered, and made sure data was anonymised. Much of the data released online at http://tinyurl.com/lidp-opendata – encourage others to play with it.

Analysis of data showed that there is a relationship between use of library resources and academic achievement. To back up the statistics, did more qualitative investigation via focus groups. Found discipline differences – e.g. Arts students tend to do a lot of ‘in library’ use, and also do a lot of web browsing for images etc, but not logging into ‘library resources’.

What next? Want to:

  • do more analysis of relationship at course level
  • how to target staff resources more effectively
  • impact of reading lists and recommendations
  • look feasibility of a national level service which you could load data to and get analysis back

Would expect find similar findings across other Universities – and independent research in New Zealand (and elsewhere?) back up the findings.

Q & A
Q: Did you look at drop out from University in light of library use?
A: No, but could do that in the data

Q: Have you considered other causation possibility
A: Yes – did explore some aspects on this in focus groups. Clearly overall outcome is affected by many things, so library usage can only ever be one part of it

Princeton e-reader pilot

Jennifer Baxmeyer and Trevor A. Dawes now talking about e-reader circulation at Princeton University Library. (some more detail online at http://www.princeton.edu/ereaderpilot/)

Trevor kicks off. Princeton offered chance of participating in pilotting the use of Kindles in libraries. The pilot showed that the Kindle DX was good for leisure reading, but not so good for study – esp. inability to use multiple texts simultaneously (note Kindle has changed since 2009)

Started to receive requests to download content to the devices. Seeing huge increase in ebook sales and usage (possibly driven by Xmas presents as see spikes in January)

Amazon sell 105 e-books for each 100 printed books.

Jennifer now coming in to talk about proposal the library made to start engaging with increase in ebook usage.

Received an ILL request for an item that turned out to only be available electronically and in fact only on the Amazon Kindle. Realised this was the tip of the iceberg. So started a working group to determine best way of acquiring e-content when requested.

Already many libraries lending both e-books and e-book readers (not just Kindles)

Princeton realised they were going to have to purchase several types of e-book device to offer content available in different proprietary formats. However, some platforms – specifically the iPad – can support multiple different formats via different ebook reader apps – Kindle App, Borders app, iBooks etc.

Decided to pilot Kindle and iPad as this covered the main formats. Proposed purchase of 3 iPads and 4 Kindles. Already had a laptop circulation programme, so could use same approach for iPad. Kindles were circulated in specialist engineering/science libraries.

For iPads same content would be available across devices, and patrons could request new items which would be reviewed by purchasers as any stock request. Kindle would have specialist (and non-duplicate) material on it, and each Kindle would have different content on them.

Next step to figure out how to make items discoverable – started to advertise via newsletter and email. Also decided to catalogue devices and the content on the devices (other libraries such as MIT and Stanford do this as well). Catalogue record was for the device, then a ‘contents note’ would detail the items. Each item also catalogue separately, but represented as linked and bound together with single item so that if device was checked out, all items would show as unavailable.

Cataloguing model still not completely agreed – still working on it.

Trevor again now talking about accessibility issues. This had come up in the Kindle DX pilot and accessibility had been challenged by US National Federation for the Blind (they wrote to all libraries participating in the pilot). This resulted in agreement with Justice Department including the term:

“The University will not require, purchase, or incorporate in its curriculum the Kindle DX or any other dedicated electronic book reader for use by students in its classes or other coursework unless or until such electronic book reader is fully accessible to individuals with visual impairments.”

This agreement is binding until 30th June 2012 . Letter available online at http://www.ada.gov/princeton.htm

This means that currently there is a delay in launching the program. At the moment staff can checkout iPad or Kindle for three days – gives opportunity for feedback, and to get staff familiar with the devices. Allowed some purchase of apps/content, but had to be ‘work related’, and limited how many could be purchased. Part of the requirement of checking out the device was to fill out survey.

Now have green light on going ahead with iPad lending program – so that will be starting off soon – aiming for June 2012. However, issues with Kindle still unresolved…