INF11 – Activity Data incubation workshop 2

This blog post is written on behalf of JISC.

David Kay from Sero Consulting kicking off this second part of this workshop, reporting on the JISC Activity Data Workshop which was held on 14th July 2010 (http://www.jisc.ac.uk/events/2010/07/businessintelligence.aspx). David says notes from the day are available at http://ie-repository.jisc.ac.uk/486/

David skimming over what was presented on the day including:

  • the differences between HE and the commercial sector use of Activity Data – and what HE can learn from the commercial sector
  • that sometimes things seem ‘too difficult’ – but look at what you can achieve
  • that even small amounts of activity data can show interesting results
  • that where we have concerns about ethical issues around activity data – we don’t have to do things we feel are unethical
  • cultural fear is endemic so we need to demystify the subject matter – there is a lack of case law in this area, and online activity data falls under ‘personal data’ in terms of the data protection act (1998) – however, use of personal data can be defended in some uses for corporate benefit, and anonymised data may not count

The event had a series of 5 debates:

  • Why should WE do it?
  • Love Data, Hate Silos
  • Local, National or Global
  • Appropriate & Inappropriate Use
  • Attention, Activity, Rating, Review – Where to stop?

Full notes from these debates are available in the event write-up.

David notes that while bids under the Activity Data strand may focus on specific types of activity data, they may need to draw on data from several systems – for example, not all the data (such as course of study a student is on) will be in the library system, so need to think about all the systems that will contribute to activity data picture.

Finally in this session, a panel session made up of David Pattern (University of Huddersfield), Graham Stone (University of Huddersfield), Joy Palmer (MIMAS), Ross MacIntyre (MIMAS), Mark Stubbs (MMU). I’ve recorded (hopefully) the spirit of the questions and answers – not verbatim, and answers may have come from more than one person on the panel:

Q: Interested in idea of involving a PhD student in the project (at MMU) – how much data do you need to do this kind of stats analysis?

A: PhD student able to do analysis – looking at showing that VLE/MLE was actually worth using – to convince lecturers etc. Notes that going to start tagging things with module code – reading lists, exam papers etc.

Q: What are the issues around exposing some of this data to potential embarrassment of some people – e.g. Huddersfield showing that for some courses the students don’t use library

A: Some data removed because might result in identification of individuals – e.g. for very small courses don’t want to publish information about attainment and borrowing. But for other cases courses have welcomed data – way of getting students to engage with library

Q: How much data is needed to do good recommendations

A: David Pattern relating that using an OpenURL resolver that even a couple of months worth of data was enough to start doing reasonable recommendations – so not necessary to have data collected over a period of time

Q: Is relating the type of data we can capture (e.g. loans, e-resource download) to actual usage a problem? Especially for e-journals?

A: All of this data is ‘access data’ rather than ‘usage data’. But still useful. Big issue interpreting a ‘zero access’ statistic – all zeroes merit closer inspection! E.g. lack of use of library could mean larger use of bookshop

A: Need to look for explanations for unusual activity data  – use the activity data to find anomalies – and then investigate, run focus groups, etc. etc.

Q: Can we go beyond collecting data from a single system – and starting ‘scrobbling’ in the way last.fm does? Is information from a single system interesting in it’s own right?

A: Definitely lots of interesting data outside central systems – so have to be clear about this. However can still be interesting, but keep in mind it is only a partial picture. Can see for example bringing together information from primary sources such as might be in a national aggregation of library circulation data, with secondary sources in journals from Mendeley.

Q: Quote from Ken Chad ‘Search is dead … welcome to the age of recommendation’ – true?

A: Recommendation is one way in – but just one way of discovering things – brings serendipity into systems. False dichotomy between ‘search’ and ‘recommendation’ – both of these activities can be either passive or active – so just different pathways to finding content.

Q: What will this add to user experience

A: Better degree possibly!

A: Business case includes better collection management; fostering academic excellence; helping people find stuff; no deadends; better use of existing (free or already paid for) resources.

Q: Who needs to have buy-in to this for the sector?

A: HESA stats would really aid benchmarking – identifying similar institutions for example

Q: What is that nature of the investment that is needed? Is it money? Or just expertise/leadership/risk?

A: Not necessarily expensive to do a small scale, but possibly at larger scale. Again need buy in (prioritisation) in the institution to do anything at all though – this perhaps one of the reasons not seen lots of institutions following example of Huddersfield. But there is real tangible payback. Pressures on space are key in many institutions – if you can show impact on space saving – e.g. by enabling disposal of library stock effectively.

Q: Are there serious legal risks?

A: This question posed to the room at large – generally the feeling was that there weren’t serious risks related to use with an institution. Noted that the use to which data is put is important part of legal picture – if used for corporate purpose.

A: Noted that University of Minnesota did work on use of ‘affinity strings’ to avoid identification of users (there is high sensitivity around the possibility of being asked to give up user data under the patriot act so pressure to anonymise data, and not keep data unnecessarily) – this was written up in the code4lib journal at http://journal.code4lib.org/articles/501

David Kay recommends a starting point from a technical perspective is ‘Scaling and productising MOSAIC search and recommendation services’ at http://hedtek.com/?p=371

David emphasises that this is not just about library data, that local institutional value needs to be shown in the bid – even if you are bidding as a consortium or partnership.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.