Read to Learn: Updated

Last year I blogged about my entry into the JISC MOSAIC competition which I called ‘ReadtoLearn’. The basic idea of the application was that you could upload a list of ISBNs, and by using the JISC MOSAIC usage data the application would generate a list of course codes, search for those codes on the UCAS web catalogue, and return a list of institutions and courses that might be of interest to you, based on the ISBNs you had uploaded.

While I managed to get enough done to enter the competition, I had quite a long ‘to do’ list at the point I submitted the entry.

The key issues I had were:

  • You could only submit ISBNs by uploading a file (using http post)
  • The results were only available as an ugly html page
  • It was slow

Recently I’ve managed to find some time to go back to the application, and have now added some extra functionality, and also managed to speed up the application slightly (although it still takes a while to process larger sets of ISBNs).

Another issue I noted at the time was that because “the MOSAIC data set only has information from the University of Huddersfield, the likelihood of matching any particular ISBN is relatively low”. I’m happy to say that the usage data that the application uses (via an API provided by Dave Pattern) has been expanded by a contribution from the University of Lincoln.

One of the biggest questions for the application is where a potential user would get a relevant list of ISBNs from in the first place (if they even know what an ISBN is). I’m still looking at this, but I’ve updated the application so there are three ways of getting ISBNs into the application. The previous file upload still works, but now also a comma separated list of ISBNs can be submitted to the application (using http get) and a URL of a webpage (or RSS feed etc.) containing ISBNs can be submitted, and ISBNs will be extracted using regular expressions (slower, but gives a very generic way of getting ISBNs into the application). I would like to look at further mechanisms such as harvesting ISBNs from an Amazon wishlist or order history, or a LibraryThing account, but for the moment you could submit a URL and the regular expression should do the rest.

Rather than the old HTML output, I’ve now made the results available as XML instead. Although this is not pretty (obviously), it does mean that others can use the application to generate lists of institutions/courses if they want. On my to do list now is to use my own XML to generate a nice HTML page (eating your own dog food I think they call it!).

I also restructured the application a little, and split into two scripts (which allowed me to also provide a UCAS code lookup script separately)

Finally, one issue with the general idea of the application was the question of how much of an overlap with the books borrowed by users on a specific course should lead to a recommendation. For example, if 4 ISBNs from your uploaded list turned out to all have been borrowed by users on courses with the code ‘W300’, should this constitute a recommendation to take a W300 course? My solution was to offer two ‘match’ options – one was to find ‘all’ matches – this meant that even a single ISBN related to a course code would result in you getting a recommendation for that course code. The second option was to ‘find close matches only’ – this only recommended a course code to you if the number of ISBNs you matched was at least 1% of the total ISBNs related to that course code in the usage data. I decided I would generalise this a bit, so you can now specify the percentage of overlap you are looking for (although experience suggests that this is going to be low based on the current data – perhaps less than 1)

So, the details are:

Application URL:

http://www.meanboyfriend.com/readtolearn/studysuggest

GET Parameters:

match

Values: ‘All’ or a number between 0 and 100 (must be >0)

Definition: Percentage overlap between ISBNs in submitted list related to a course code, and total ISBNs related to the course code that will constitute a ‘recommendation’.  ‘All’ will retrieve all courses where at least one ISBN has been matched.

isbns

Values: a comma separated list of 10 or 13 digit ISBNs

url

Values: a url-encoded url (include ‘http etc.’) of a page/feed which include ISBNs. ISBNs will be extracted using a regular expression. (See http://www.blooberry.com/indexdot/html/topics/urlencoding.htm for information on URL encoding)

If both isbn and url parameters are submitted, all ISBNs from the list and the specified webpage will be used.

Example:

An example request to the script could be:

http://www.meanboyfriend.com/readtolearn/studysuggest?match=0.5&isbns=0722177755,0722177763,0552770884,043999358,0070185662,0003271323,0003271331,0003272788

Response:

The response is xml with the following structure (this is an example with a single course code):

<study_recommendations>
<course type=”ucas” code=”V1X1″ ignore=”No” total_related=”385″ your_related=”3″>
<items>
<item isbn=”0003271331″></item>
<item isbn=”0003271323″></item>
<item isbn=”0003272788″></item>
</items>
<catalog>
<provider>
<identifier>S84</identifier>
<title>University of Sunderland</title>
<url>http://www.ucas.com/students/choosingcourses/choosinguni/instguide/s/s84</url>
<course>
<identifier>997677</identifier>
<title>History with TESOL</title>
<url>http://search.ucas.com/cgi-bin/hsrun/search/search/StateId/Dhh-QG8Bhe33Egpbb227I8OPTGQUw-VTyY/HAHTpage/search.HsDetails.run?n=997677</url>
</course>
</provider>
<provider>
<identifier>H36</identifier>
<title>University of Hertfordshire</title>
<url>http://www.ucas.com/students/choosingcourses/choosinguni/instguide/h/h36</url>
<course>
<identifier>971629</identifier>
<title>History with English Language Teaching (ELT)</title>
<url>http://search.ucas.com/cgi-bin/hsrun/search/search/StateId/Dhh-QG8Bhe33Egpbb227I8OPTGQUw-VTyY/HAHTpage/search.HsDetails.run?n=971629</url>
</course>
</provider>
</catalog>
</course>
</study_recommendations>

The ‘catalog’ element essentially copies the data structure from XCRI-CAP which I’ve documented in my previous post – I’m not using this namespace at the moment, but I may come back to this when I have time. The ‘course’ and ‘provider’ element can both be repeated.

If you are interested in using it please do, and drop me a comment here if you have examples, or suggestions for further improvements.

UCAS Course code lookup: Take two

Last year as part of the JISC MOSAIC competition, I put together a script which allowed you to search the online UCAS catalogue using a course code, and get an XML response. The XML it returned was just a basic format which suited my purposes at the time, and in the comments I gave the following response to Alan Paull who mentioned XCRI:

I’m aware of the XCRi model and the XCRi-CAP work, and did wonder if I could output my scraped results in this format, but in the end decided for something quicker and dirtier for my purposes.

XCRI (eXchanging Course Related Information) is a JISC funded initiative to “establish a specification to support the exchange of course-related information”. This has established an XML specification intended to enable courses to be advertised or listed in a consistent manner – this is called ‘XCRI-CAP’ (Course Advertising Profile). A number of projects and institutions have implemented XCRI-CAP (a list of projects is available from the CETIS website).

The key thing for me about this approach is the idea that if all institutions (let’s say UK HE institutions, but XCRI-CAP is not sector specific) published their course catalogue following this specification, it would be a relatively simple matter to use, aggregate, disaggregate and reuse this data.

I’ve wanted to get back to this for a while, and finally got round to it, so you can now get results from the script in XCRI-CAP. I have to admit that I’ve a slight confusion as to what makes valid XCRI-CAP – I’ve run the results through the validator blogged by David Sherlock, and get a small number of warnings regarding the lack of ‘descriptions’ for each provider I list. However, the XCRI wiki entry for the provider element suggests that the Description is ‘optional’ (although it then says it ‘should’ be provided).

The script is at:

http://www.meanboyfriend.com/readtolearn/ucas_search

The script accepts four parameters described here:

format

  • If left blank, results will be returned in the default XML format (not xcri-cap) – documented below
  • If set to the value ‘xcri-cap’ the results will be returned in xcri-cap XML – see notes below. If there is an error, this will use the default XML fomat documented below

course_code

  • Accepts an UCAS course code, which is used to search the online UCAS catalogue (4 alphanumeric characters)

catalogue_year

  • Accepts a year in the format YYYY
  • If no year is given, this is left blank
  • UCAS supports searches against more than one catalogue at a time, to enable searching against the current and coming year. If left blank, as far as I can tell, this defaults to the catalogue for the current year (at time of writing, 2010)

stateID

  • The UCAS website uses a session identifier in all URLs called the ‘stateID’
  • If a stateID is supplied to the script, it will use it (unless it turns out to be invalid)
  • If no stateID is supplied, or the stateID supplied is invalid, the script will obtain a new stateID
  • If you are doing repeated requests against the script, it would be ‘polite’ to get a stateID the from the first request, and reuse it in subsequent requests so the script isn’t constantly starting new sessions on the UCAS website

So a valid request to the script could be:

http://www.meanboyfriend.com/readtolearn/ucas_search?course_code=W300&format=xcri-cap&catalogue_year=2010

In terms of output, there are two formats, the default XML, and the XCRI-CAP XML.

XCRI-CAP XML

I’m outputting a minimal amount of data, as I’ve limited myself to scraping only information from the UCAS catalogue search results page. This means I’m currently including only the following elements:

<catalog>
<provider>
<identifier />(I suspect I’ve got a problem here. I’m using the UCAS identifier, which I can’t really find any information about. From the XCRI wiki it looks like I need to be using a URI here)
<title />
<url />(I’m using the URL for the UCAS page for the institution. This includes the stateID, as to link to the UCAS page requires a valid session. It isn’t ideal, as this is only valid for a limited period of time [now amended to use a different URL to the UCAS web page which does not include stateID])
<course>
<identifier />(I’m using the UCAS identifier for the course, again it looks like I should be using a URI from the wiki?)
<title />
<url />(I’m using the URL for the UCAS page for the course. This includes the stateID, as to link to the UCAS page requires a valid session. It isn’t ideal, as this is only valid for a limited period of time)
</course>
</provider>

I am looking at whether I can get more information, but to add to the information I’m currently returning would mean doing some further requets to the UCAS website to scrape information from other pages to supplement the basic information available on the search results page.

Default XML

The default XML format is documented in my previous blog post, but just to recap:

<ucas_course_results course_code=”R901″ catalogue_year=”2010″ ucas_stateid=”DtDdAozqXysV4GeQbRbhP3DxTGR2m-3eyl”>
<institution code=”P80″ name=”University of Portsmouth”>
<inst_ucas_url>
http://search.ucas.com/cgi-bin/hsrun/search/search/StateId/DtGJmwzptIwV4rADbR8xUfafCk6nG-Ur61/HAHTpage/search.HsInstDetails.run?i=P80
>/inst_ucas_url>
<course ucas_catalogue_id=””>
<course_ucas_url>
http://search.ucas.com/cgi-bin/hsrun/search/search/StateId/DtGJmwzptIwV4rADbR8xUfafCk6nG-Ur61/HAHTpage/search.HsDetails.run?n=989628
</course_ucas_url>
<name>Combined Modern Languages</name>
</course>
</institution>
</ucas_course_results>

Note that you get the ucas_stateid returned, so it can be reused in future requests. Finally, if there are any errors, these will always be returned in the default XML format (even if you request xcri-cap format):

<ucas_course_results course_code=”” catalogue_year=”” ucas_stateid=””>
<error />
</ucas_course_results>

Next steps

Over the last  year or so I’ve thought, and occasionally mentioned to anyone who will listen, that I might like to try moving away from a traditional library job, and becoming self-employed, probably doing some kind of consultancy work, although still very much working with, and in, libraries.

Since June 2009 I’ve been working on the TELSTAR project at the Open University. The project was originally scheduled to finish in February 2010 – basically, now – and my contract at the Open University ran up until that date. In the last month, the TELSTAR project has been extended, and my contract now runs to the end of July this year. However, during the extension period the project doesn’t need a full-time project manager position, so I’ll just be working on the project half-time.

I’m not sure I could be offered a better opportunity than this to put my thoughts and day-dreams into action. Basically if I don’t do this now, I don’t think I’ll ever do it. So, from Tuesday (I’m taking Monday off!) I’m going to be striking out on my own. Despite my good intentions I’ve not managed to do the preparation and planning that I’d initially hoped to get done in the months and weeks leading up to this point, so the first couple of weeks are likely to be spent with some pretty basic tasks to get out of the way such as:

  • setting up a company
  • getting a business bank account
  • buying a domain and setting up a website
  • working out how this is all going to work!
  • making lists of all the other stuff I realise I need to do

As you might imagine, pretty nervous about all this, but also very excited. I’ll post more over the next couple of weeks, but in the meantime wish me luck, and if you know anyone who needs someone to consult about ‘library stuff’ (especially digital stuff), point them in my direction 🙂

Mashing and Mapping

Middlemash, the third Mashed Library event, took place on the 30th November. Hosted by Damyanti Patel (my other half) and her team (Mark, Robin, Chris and John) at Birmingham City Unversity, the day was once again a split between talks and hands on mashing. In some ways I think it may have been the most ‘twitter active’ event I’ve been at so far – there were around 560 tweets tagged with #middlemash on the day itself. Although possibly some of the bigger conferences I’ve been at had more volume, I don’t think any has had the density of ‘tweets per delegate’ 🙂 There was even an ‘official tweeter’ in the form of @joeyanne. There is an archive of all the tweets at http://twapperkeeper.com/middlemash.

The day started with Tamar Sadeh from Ex Libris (who also sponsored the day) talking about a variety of things including the Ex Libris Code Share wiki – I was really pleased to see that this is accessible to everyone – although only Ex Libris customers can post code .

Following this Mark van Harmelen from HedTek Ltd introduced concepts of rapid prototyping and working with users – stressing the flexibility of paper, pens and post-it notes in the design process, and also the importance of making development a collaborative process.

Then we had three ‘case studies’ from Edith Speller, Paul Stainthorp and Chris Keene – it was great to see some examples of mashing in action from real situations, solving practical problems.

In the afternoon I’d already decided I wanted to pick up something I’d played with briefly at the first Mashed Library event (#mashlib08), which was using the Google Maps interface. I’d sort of volunteered to ‘lead’ a session – which I’m afraid I didn’t do a brilliant job of – not enough preparation I’m afraid – so if you came along I’m sorry about that.

We started with (I think) a good discussion of how Google Maps (and similar systems like OpenMap) work (more on this in a minute), and what the practical issues of maintaining floorplans for the library were – especially where you wanted to be able to indicate where a specific book is. The truth is that locating an item on a specific piece of shelving has not been something that most libraries have bothered to do in the past (certainly on open shelving) – relying instead on a set of ‘rules’ you can follow to work out where a specific book will be – at least, relative to the other books in the library. In theory the item record on the catalogue will give you enough information to find the item – typically the information will include:

  • Library site (for multi-site libraries)
  • Collection (sometimes based on discrete sets of material, but sometimes general geographic locations like ‘First floor’)
  • Loan period (this is sometimes, but not always, linked to a physical location)
  • Classmark or Shelfmark

In well designed modern libraries, you can usually use this information to work out where a book is relatively easily. However, when you sometimes have shelfmarks like “Cupboard S” (real example) basically there is no way of working out where the book is – you just have to ask where “Cupboard S” is.

Of course, books are relatively easy – tracking down a journal volume or item almost always relies on simply knowing how the alphabetical sequence of titles winds its way around a set of shelving (and sometimes where older materials have been shelved in separate, less accessible, shelving).

What is perhaps slightly odd is that most libraries do keep somekind of signing up to date – usually in the form of ‘shelf ends’ which indicate which range of classmarks (or journal titles) is on a specific shelf. However, it seems that these are not usually linked into the library systems at all (although at least one library in the group did record these in a an Access database). One of the issues with this kind of signing, and the general idea of linking an item to a specific shelf, is that for the items that are close to the start or end of a shelf unit, there is a relatively high likelihood they will be slipped ono the previous or next unit as they are reshelved and the amount of stock on the unit changes.

We had some discussion of how libraries might keep track of what books were on which shelf unit more closely – either by scanning the first and last book on a shelf each time, or looking to RFID to help – and Dave Pattern reminded us (over Twitter) that he had blogged an idea of using RFID for this purpose a couple of years ago.

At this point I wanted to see if we could get something done with Google Maps and a library floorplan during the afternoon, and so I wanted to move on this with. While I settled down to this with Rob Styles from Talis, others started to look at what the various requirements were for a ‘library map’ application – which Graham Seaman gathered together and posted on the mashed library wiki – there are some great ideas, and it feels like there is a real application waiting to be specified there.

Back to the maps. Essentially the way the various mapping systems work is to have ’tiles’ which each represent a section of the map. With Google Maps (and I think this is common to other platforms) the tiles are 256 x 256 pixels. This concept of tiling works in conjunction with the ability to zoom in and out of the map. The basic idea is that at maximum zoom out, you fit the entire map on a single 256 x 256 tile. As you zoom in, you double the number of tiles both along the width and height of the map (i.e. the x and y axis). For Google Maps zoom starts ‘0’ (zero) – a single 256 x 256 tile. This means a zoom of ‘1’ is 2 x 2 tiles (i.e. 4 tiles), zoom ‘2’ is 4 x 4 (16 tiles) etc. Much of the documentation I found suggested that Google currently supported zoom up to 17 – but on the day we actually found that it supported zoom up to a value of 21 – and I guess if they ever get more detailed maps or satellite they will support higher levels of zoom. There is more on how tiles work at http://code.google.com/apis/maps/documentation/overlays.html#Google_Maps_Coordinates

Lyn Parker from the University of Sheffield ‘volunteered’ their floorplans (http://library.shef.ac.uk/open/floorplan/plans.html) to be used in our project. So the first job was to create the tiles we needed. I have to admit that I’d thought of this stage as the ‘boring but necessary’ bit – however, looking back on this it is in some ways the most complicated bit as for each level of zoom you want, you need to resize the graphic and cut it into appropriate tiles. Luckily there are already some scripts available to do all this work for you. Even better, Rob had Photoshop on his Mac, and we got a Photoshop ’tiling’ script from Mapki – a wiki about the Google Maps API.

Our original idea had been to create a ‘custom map’ to essentially present the floorplan within the Google Maps interface. However, the tools available generally seemed to be aimed at overlaying information on the ‘real world’ as represented in Google Maps. So, we got slightly diverted at this point, and decided to see if we could insert the Sheffield floorplan over the real building in Google Maps. With some help from Lyn, we found the building on Google Maps and Rob started to manipulate the floorplan image so we could align it with the building on the map.

Although this took us away from the initial idea, we were quite excited by the idea that if we got this right, we would be able to assign real world latitude and longitude to items marked on the floorplan – including shelf-units. There is definitely something satisfying about this idea, although whether it would turn out to be of practical benefit is less clear to me.

As well as re-orienting the floorplan image, we also had to work out where it should display on the Google Map. This, rather frustratingly, involves knowing the numerical identifiers of the actual Google Maps tiles – after some hunting around, the best tool for this turned out to be one provided by Google at http://code.google.com/apis/maps/documentation/examples/tile-detector.html – this allows you to identify both the tile identifiers and the latitude and longitude (which you also need) – although frustratingly you can’t just type in a postcode or lat/long value to get to the location you want. This tool also gives you the ‘zoom’ level – which you also need.

Once you’ve gathered all the relevant information, you can feed it into the tile cutter – including the number of ‘zoom’ levels you want to produce tiles for. Having done this we finally needed to write a web page to display the google map, with our new tiles integrated into the dispay. This involves using the Google Maps API, and I cannibalised the example script at http://econym.org.uk/gmap/example_custommap3.htm by Mike Williams (whose tutorial at  http://econym.org.uk/gmap/ I found reasonably useful throughout the exercise).

With various adjustments as we reached the end of the afternoon (moving from using jpg images to png for the tiles, so we could create a transparency effect), and some minor adjustments by me after the event, we got a map up and working at http://www.meanboyfriend.com/mashedlib/mapping/maps.html

It’s pretty obvious we didn’t quite manage to align the map properly 🙂 We did find some tools that are mean to help with this – but they didn’t always seem to work, and some links were just dead. I guess a little more investigation – or some trial and error – would get this solved. However, I’m pretty pleased with what we got done in limited time – thanks to Rob working with me on this.

Actually I think we did one of the hardest things we could have picked to be honest. Looking at it again now, and perhaps understanding it a bit more in retrospect, I think we could have assigned arbitrary tile numbers if we had simply wanted to achieve a Google Maps interface to the floorplan – and it looks to me like then doing overlays on this would have been pretty straightforward as well – when I get a chance I’ll try and test this theory! I really like the idea of the ‘heatmap’ for library stock usage (first suggested by Amy Hadfield at Mash Oop North) and would like to get a demonstration of this running.

So – a great day’s mashing – thanks to all at Birmingham City University who organised and ran the day, and everyone who came along and made it such fun.

FAM09 – Closing session

This session by Nate Klingenstein.

Today’s Federated Identity Challenges:

  • Scaling – especially cross-sector and cross national boundaries
  • Getting the user experience right – not just in Higher Education – is going to be even harder than the challenges we face today.
  • Protocol wars – new, powerful players in the area
  • Levels of assurance and attribute support
  • Reconcilation between consumer and enterprise identity – possibly the biggest challenge

‘The Cardiff Giant’ – a statue discovered in Cardiff (New York). Copied  by P.T. Barnum (covertly) and toured. This all showed:

  • Even a fake can be very popular
  • Fake identites and indentity theft are widely recognized, growing problem

Identity is big business – e.g. Doubleclick (acquired by Google) – serving personalised advertising.

Universities house both applications and identities. They are the natural ‘home’ of much user data – e.g. Courses, titles, grades. Universities also host applications – but increasingly these may not be hosted locally. The important players in Academic Identity are:

  • Government
  • Faculty
  • Applications (Commercial and other)
  • Users

What do Governments want?

  • Privacy laws and their enforcement vary wildly from country to country
    • China and the EU offer useful (and possibly polar opposite) examples
    • A situation that needs careful balancing if there will be meaningful enforcement
  • We need recognition of the social importance of trust – some evidence that trust in financial markets drives economic properity?

What do Faculty want?

  • Good learning resources and tools
  • Students undivided attention (possible issue with using external tools e.g. social networks to deliver teaching material)
  • Freely circulated intellectual property?
  • Stronger intellectual property rights?

What do Commercial Applications want?

  • A userbase to monetize
    • page views, successful completion of login, high retention rates, lost of juicy personal details (hence reluctant to engage with federated access management)
    • licensing fees
    • Advertising is a nice plus

What do Other Applications want?

  • They’re often not sure, and would like you to help them
  • Happy to be out of the usr/pwd trap
  • Varying degrees of control over the GUI and authentication process
  • “Security” and “usability”, vaguely
  • Identity services are critical for “cloud” computing

What do Users want?

  • Studies by JISC, Yahoo!, Google and others show that to get users to use the services you offer:
    • You need consistency, consistency, consistency
    • Bifurcation is confusing, particularly if there’s an email address box or user/pass option (i.e. more than one option)
    • Users have no idea what a domain is
    • Even with coaching, outcomes from typing URL-based identity do not improve
    • Buttons are best, but alternatives are okay

Users understand the difference between a professional account and a personal account, work app and personal app – and can generally select between them. Privacy and security are consistently rated as very important – especially in coutnries with weak privacy laws. However LSE study demonstrated – convenience often wins in practice anyway.

Consumer Identity Today

  • Facebook Connect by far the most successful
    • proprietary protocol, single identityt providers
    • inducements for applications – lots of personal data for targeted ads
  • Twitter comes in second, followed by also-rans

Facebook Connect – on Huffington Post, http://money.cnn.com (the latter only supports Facebook connect for commenting). Some interesting stats on various mechanism for logging into the Typepad blogging platform at http://blog.leahculver.com/2009/11/log-in-or-sign-up-with-openid.html

Convergence between Educational Identity and Consumer Identity – It’s already happening! How soon will your students ask for a ‘Facebook Connect’ login to your VLE?

The level of assurance gravitates towards the lowest common denominator – often basically an email address that doesn’t ‘bounce’. Social Networks include a large level of assurance, as you have lots of people ‘vouching’ for you (although questions about how much this is worth, it definitely isn’t worthless). Maybe ‘strongly vetted’ ID is not what Universities should try to provide. Instead we may want to focus on the attributes:

  • Consumer identity world is rapidly realizing that attributes are key
  • Need to solve problems like attribute aggregation
  • Attribute plumbing from the campus to the consumer Identity Provider – Google is trying the business modle

If consumers opt for Facebook, perhaps this is an opportunity for Universities to stop worrying about the ‘discovery’ problem – even if we worry about the implications of Facebook managing this instead.

Preparing for those futures:

  • Be protocol-agnostic
    • OpenID support in the Shibboleth IdP is a good start
  • Expectations and functionality are driven today by commerce and consumer identity
    • Users unlikely to exert change
    • Faculty will use the best tools available
    • Commercial applications like money
  • Discovery is the real control point – if you present a ‘Facebook Connect’ button at this point, users will click it
    • No single right answer
    • eduID or similarly branded login – this is contentious issue
    • Some people want to stop buttons or dedicated discovery entirely
  • Proactively contemplate partnerships with the other identity sources

Current course excellent – we are doing most of the right things – even if for the attributes and policies alone which is 9/10 the effort and value

IceRocket Tags:

Group Management

This session from Caleb Racey and Richard James from Newcastle University.

  • FAM requires attributes. For example, if you want to offer resources to (for e.g.) a member of the medical faculty – you need to know which users these are.
  • At Newcastle the systems Grouper and Talend provide this
  • Federated identity is a subset of campus identity

Data management is the key to access control:

  • User identity
  • Unit (granularity) of access contorl
    • Department membership
    • Module enrolment

Identity data is aggregated from several different sources/systems across the University.

What is ‘Grouper’?

  • Toolkit to manage institutional and personal groups
  • Collaborative project from internet2
  • API for managing groups
  • UI + web services + shel interfaces to access API
  • http://www.internet2.edu/grouper/

Newcastle use Grouper to provide access control to different resources – wikis, lecture capture system, room book system. They populate Grouper with the institutional

Grouper has a user-facing interface – gives control to the user, enables local teams to manage memberships of groups etc. Grouper then releases it’s ‘”Groups” to Shibboleth as attributes.

Talend is used to structure the data before import into Grouper – there are more details at http://research.ncl.ac.uk/idmaps/videos.php

IceRocket Tags:

FAM09 – Day 2

Opening the second day is Mark Tysom talking about the UK federation.

There are now 765 members of the UK federation, which has now been operating for 3 years. They now have:

  • 74% of UK FE institutions
  • 100% of UK HE institutions
  • 57% of schools in England
  • 100% of schools in Northern Ireland and Scotland

In this context ‘signup’ just means that they have agreed to the Federation rules – it doesn’t mean they are actively participating in the Federation.

Service Enhancements coming:

Details at http://www.ukfederation.org.uk/content/Documents/DevelopmentRoadMap. Today Mark is going to look at the next 6 months or so:

WAYF Review

  • Provide and independent review of the current WAYF login processes
  • Improve the usability and accessibility for all users and enhance the user experience
  • Conduct user tests with a series of sites to assess the usability of the WAYF interface
  • Identify any other direct enhancements to be made
  • Provide prioritised recommendations for next steps and future development by end July 2010

They have engaged an external company to assess usability of the WAYF, getting evidence from talking to users, and observing how they interact with WAYFs/login. Clearly some crossover with studies such as Publisher Interface study – so they are sharing the outcomes of the study with these other projects.

Portal Best Practice

WAYF is a ‘backstop’ solution – i.e. not the preference. The UK Federation encourage the development of ‘portals’ – I’m not quite clear who they think will develop these ‘portals’ and why users will actually come to resources via portals – this just seems like a backward looking idea to me? Perhaps I’ve misunderstood?

Some clarfication on questioning – it seems that in this sense they mean the UK Federation WAYF as opposed to WAYF as a process generally. I think it is key we assume that users will hit resources from the open web rather than via a system controlled by the library or institution.

Statistics Gathering

  • Provide mechanisms to all the operatiors of IdPs and the federation to visualise how the service is being used
  • Provide mechanism to populat an anonymous central database that can store usage data for these services
  • Review existing mechanisms for gathering federation metrics
  • Incorporate solution into the JANET Netsight2 Service

Mark also mentioned they would be looking at Metadata scaling and running a Satisfaction Survey

Now Mark mentioning a couple of policy areas they are going to be looking at – Inter-federation agreements and Eligibility for membership – the latter looking at interest from other sectors such as NHS, Governments, Museums.

IceRocket Tags:

Shibboleth Developments

Chad La Joie – from SWITCH

Shibboleth 1.3 reaches end of life on June 30th 2010 – there will be absolutely no support after this time – so you should be planning to have upgraded to Shib 2.0 by this date!

Next release of Shibboleth IdP is 3.0 – this is not a major rewrite – do not wait to upgrade! Main goal – to clean up APIs hindering new work. Also includes n-tier delegation support and non-browser based authentication.

Discovery Service 2.0

  • incorporation of feedback from JANET funded usability study
  • support for centralised and page-embedded models
  • HTML/CSS/JavaScript that can be dropped into an SP to render a discovery interface

Chad claims that if you give SPs just a snippet of HTML or JavaScript, they are happy to embed it in their interface (not sure about this – what if they get competing demands from different federations)

N-tier delegation

What? – user logs into the portal, and the portal logs into back-end services as the user – this is delegation

Goals

  • allow service to log in to the back-end server as the user
  • control which services can impersonate the user
  • keep a complete audit trail of impersonation
  • and other stuff …(sorry, missed this)

Attribute Aggregation

What:

  • aggregate user attribute from home organization and other sources such as professional organizations

Goals

  • Allow SP to pull in attribute from multiple attribute authorities (IdPs)
  • use existing attribute release/acceptance policy mechanisms

Status

  • latest SP has support out of the box
  • 2.x IdP has support out of the box
  • currently only identifiers shared by AAs and SPs are supported

Future work

  • determine if non-shared identifiers are usable/supportable
  • determine if IdP aggregated attributes is useful and tenable

How does the SP know where to aggregate attributes from? At the moment can either be hardcoded in SP, or sent from the IdP.

OpenID Support

Goals:

  • support XRD 1.0, Open ID 2.0, PAPE, Simpler Registration, Attribute Exchange
  • use existing trust layer to create trust between OpenID entities
  • use existing attribute release mechanism

Status

  • XRD 1.0 now out of community review
  • basic support for OpenID 2.0 and PAPE support via proof-of-concept IdP plug-in
  • trust equal to standard deployment of Shibboleth
    • OpenID protocol dos not support certain advanced trust models
  • No SP support planned

Future Work

  • develop real IdP plugin based on IdP v3

Buzzwords: User-centric identity

  • Two views of user-centric identity
    • 1. Purist – all data about a person is property of, should be kept by, and should be released by the person – i.e. OpenID model
    • 2. Identity 2.0: User picks which account and associated data should be used with which service – i.e. Cardspace model
  • But – users aren’t authoritative – or trustable source of, for most of their data
  • most user’s can’t run their own identity provider
  • most user’s have a hard time understanding relationships between attributes and the service provider

The goal should probably be a release consent model added to the Identity 2.0 view – e.g. Shibboleth + uApprove  (http://www.switch.ch/aai/support/tools/uApprove.html)

Buzzwords: Cardspace

CardSpace generally refers to two things:

  • Microsoft’s evolution of Passport in to a decentralized service – know by MS as the ‘identity metasystem’
  • Microsoft’s client for the service is the the only thing that Microsoft calls CardSpace

Primary focus on avoiding phishing.

However – now Microsoft now doing server-side implementation called ‘Geneva’ – which is the non-interoperable, spiritual successor to ADFS. This does not currently interoperate with other products – including MS own Cardspace selector.

MS-hosted ‘cloud’ Exchange, SharePoint and storage service have Geneva support – and SharePoint 2010 will have support as well.

MS have asked Shibboleth team to add Geneva support – which they would do if MS would actually make the specification available!

Buzzwords: OAuth

OAuth is an access delegation protocol:

  • You login to Service B. Service B wants your information from Service A. You login to A, get a token, and give it to B. B uses  the token to get information from A.
  • OAuth is independent of the means by which a user is authenticated of the format of the token
    • so OAuth is orthogonal to federated identity management (although you could use things like n-tier delegation to achieve this)
  • OAuth is current under-specified
IceRocket Tags:

Federated Access: The Library Experience

A three part presentation – first up Sarah Pearson from the University of Birmingham on their experience:

Authentication overview:

  • Mixture of Shibboleth, IP and username/password authentication
  • EZProxy used for off-campus (recently implemented)
  • SSO to Metalib (federated search), Shibboleth and EZProxy
  • Extra sign-on needed between Portal, WebCT and Metalib

Authentication – setup, maintenance and troubleshooting – needs involvement from:

  • Serials Team (Library services)
  • Digital Library team (IT Services)
  • Networks team (IT Services)

Shibboleth implementation relatively straightforward as already had good quality data in directory

Implementation timescale at B’ham

  • Jan 08 – decided to implement Shibboleth for July 2008
  • Jan-Mar 08 – tested current authentication, set up IdP and shibbolized Metalib
  • Mar-Apr 08 – Prioritised ‘Athens only’ resources with Shibboleth
  • July 08 – changed all links in Metalib to Shibboleth
    • decided to retain Athens for 1 year as some resources not supporting Shib
    • Migration of remaining Athens resources to other methods
  • July 09 – ended Athens subscription but implemented EZProxy

Decisions made

  • Athens only and IP/Athens authenticated resources to be moved to Shibboleth
  • WAYFless URLs where possible
  • Shibboleth preferred over IP
  • Shibbolized metalib
  • Extended Athens subscription for 1 yr

Implementation process

  • Contacting service providers
  • Knowing which information to provide
  • Obtaining and testing WAYFless URLs was time consuming
  • Adding new URLs to Metalib (library portal/federated search)
  • Adding notes for specific resources

Issues and Challenges

  • SP discoverability / navigation issues – not everyone comes to the resource from the library website/portal
  • Dual authentication and personalisation
    • Although University of B’ham prefer Shibboleth to IP authentication – some resources us IP as a preference
  • WAYFless URLs
    • different suppliers use different constructions
    • Some support
  • SFX (OpenURL resolver) integration – providers don’t necessarily support deep linking in a consistent or good way
  • IdP downtime – have introduced a single point of failure

Secondly Francis Lowry from Nottingham Trent University

NTU approx 25,000 FTEs across 3 campuses

  • NTU was a early adopter of Shibboleth – in 2005
  • Shibboleth ‘just worked’ – it has been very stable
  • Currently on Shib 1.3, going to upgrade to 2.0 in Summer 2010
  • Shibboleth not a panacea – managing expectations was a big issue – e.g. Shib is not a SSO solution

Now Richard Cross takes up the story from the library side:

  • NTU Library do not talk about ‘Shibboleth’ – may describe the benefits of FAM, but talk about ‘NTU username and password’
  • Personalisation features – issue of migrating from personal settings on remote resources being linked to Athens PUIDs – and needed to migrate to linking to Shibboleth IDs
  • Some resources ended up losing personalisation features
  • Communication with colleagues etc. key
  • Switchover remarkably smooth
  • Customers appeared to find the process quite intuitive
  • No permanent loss of off-campus access to any significant resources

Richard mentions the JISC Publisher Interface Study – incredible inconsistency in how service providers implement and talk about authentication – this needs to change. WAYFLess URLs over engineered, inconsistent syntax – real problem. Particularly OpenURL resolvers need to work with WAYFless URLs

  • Lack of utilities toolkit – reduced usage data
  • No ‘admin interface’, no reporting functionality, no troubleshooting tools
  • Reduced statistics (even at basic level) to previously (when using traditional Athens authentication)

Customer experience?

  • May well remain unimpressed by the delivery of ‘mostly single’ sign-on (but terms and conditions apply)
  • Potential remains for customer confusion about how libraries manage the authentication exceptions
  • WAYFless URLs only work when the user accesses resources via the library – which is not how many people approach resources – coming in from Google and other resources

Don’t expect to be thanked for successful Shibboleth implementation – it is just seen as ‘business as usual’

Closing thoughts (from Francis):

  • Shibboleth is not just as a replacement for Athens Authentication – opportunity for closer more collaborative working across institutions
  • Vision for Shibboleth is more shared resources and services
    • Shared learning environments and resources
    • NTU CV Builder
    • Single framework for access to all university and externally provided services

NTU essentially embraced Shibboleth as a framework for authentication and authorisation across the board – all products they now tender for need to support SAML or similar…

IceRocket Tags:

FAM09

For the next couple of days I’m at FAM09 – a JISC event about Federated Access Management.

First up Peter Tison (UCISA), and Sarah Marsh (SCONUL) on “Identity and Access as UK Priority”. Peter summarising the move towards federated access management in the UKHE sector over the last few years. JISC outlined a road map, acknowledged the need for institutional effort/resource.

There is still very little implementation of federated access (says Peter) – why?

  • Lack of external resources
  • Lack of internal resources
  • Athens is still there …

JISC review April 2009 – about half institutions using Shibboleth and half OpenAthens (small numbers other).

Within the library Federated Access opens possibility of:

  • Shared services
  • Saving money by targetting subscriptions on specific user groups
  • Integration with OpenID?

Across the institution Federated Access could:

  • Give access to internal systems and external resources
  • Access to 3rd party s/w
  • Access to internal resources from off site
  • Seamless access to external resources

So – Peter says what we need now is:

  • Clear strategic message
  • A benefits/impact analysis
  • A longer road map:
    • solid identity management platform
    • first step as an Athens replacement – but it is more than this
    • identify the internal benefits of single sign-on
    • linking to external resources

Some questions around granularity of access to resources – not necessarily good thing for library resources – however is essential for other types of resources – e.g. finance systems

Second up, International developments by Josh Howlett (Janet).

Now many different federations internationally. However, can have different policies for different data elements – e.g. fallow period for reuse of EduPerson principal name. There are now quite a few projects/intitiatives looking at how you can work across these different federations – e.g. Kantara Initiative – cross-sector identity initiatives

Geant – a consortium of all the European national networks. 37 participating countries. £200million euros over 4 years – big initiatives. Geant is concerned about connecting national networks – not at an institution level generally. eduGAIN is one part of Geant.

eduGAIN goals

  • enable interoperability between national federations by undertaking the necessary technical and policy coordination
  • To build on this interoperability

eduGAIN pilot service use cases:

What will it provide me with?

  • Identity providers: obtain access to services regiestered in other federations
  • Service provider: provide access to identities issued by providers registered in other federations
  • Eurpoe-scale reach at a zero to modest expenditure of effort

What should I do?

  • ensure national federation is aware of your interestedt
  • prepare for SAML 2.0
  • Be ready for October 2010

Finally before coffee Mark Cross about commercial developments

Mark is from OpenID UK.

The institution you are a member of today is only one part of your identity

Roadmap for OpenID:

  • OpenID v1
    • SSO & Delegation
  • OpenID v2
    • attribute exchange
    • PAPE – Provider Authentification Policy Extension
  • OpenID v3
    • Contract Exchange Extension Working Group
    • Increased Security

Delegation!

OpenID going forward. Recent meeting agreed to work on:

  • Integration of OAuth Hybrid into core specifications
  • Looking at supporting email as well as web address (Mark Cross felt this was a divergence from original vision of OpenID)

Big likely implementers of OpenID in the UK – the Telegraph and the BBC

Identity Management is important in its support of a Knowledge Society.

IceRocket Tags: