Last year I blogged about my entry into the JISC MOSAIC competition which I called ‘ReadtoLearn’. The basic idea of the application was that you could upload a list of ISBNs, and by using the JISC MOSAIC usage data the application would generate a list of course codes, search for those codes on the UCAS web catalogue, and return a list of institutions and courses that might be of interest to you, based on the ISBNs you had uploaded.
While I managed to get enough done to enter the competition, I had quite a long ‘to do’ list at the point I submitted the entry.
The key issues I had were:
- You could only submit ISBNs by uploading a file (using http post)
- The results were only available as an ugly html page
- It was slow
Recently I’ve managed to find some time to go back to the application, and have now added some extra functionality, and also managed to speed up the application slightly (although it still takes a while to process larger sets of ISBNs).
Another issue I noted at the time was that because “the MOSAIC data set only has information from the University of Huddersfield, the likelihood of matching any particular ISBN is relatively low”. I’m happy to say that the usage data that the application uses (via an API provided by Dave Pattern) has been expanded by a contribution from the University of Lincoln.
One of the biggest questions for the application is where a potential user would get a relevant list of ISBNs from in the first place (if they even know what an ISBN is). I’m still looking at this, but I’ve updated the application so there are three ways of getting ISBNs into the application. The previous file upload still works, but now also a comma separated list of ISBNs can be submitted to the application (using http get) and a URL of a webpage (or RSS feed etc.) containing ISBNs can be submitted, and ISBNs will be extracted using regular expressions (slower, but gives a very generic way of getting ISBNs into the application). I would like to look at further mechanisms such as harvesting ISBNs from an Amazon wishlist or order history, or a LibraryThing account, but for the moment you could submit a URL and the regular expression should do the rest.
Rather than the old HTML output, I’ve now made the results available as XML instead. Although this is not pretty (obviously), it does mean that others can use the application to generate lists of institutions/courses if they want. On my to do list now is to use my own XML to generate a nice HTML page (eating your own dog food I think they call it!).
I also restructured the application a little, and split into two scripts (which allowed me to also provide a UCAS code lookup script separately)
Finally, one issue with the general idea of the application was the question of how much of an overlap with the books borrowed by users on a specific course should lead to a recommendation. For example, if 4 ISBNs from your uploaded list turned out to all have been borrowed by users on courses with the code ‘W300’, should this constitute a recommendation to take a W300 course? My solution was to offer two ‘match’ options – one was to find ‘all’ matches – this meant that even a single ISBN related to a course code would result in you getting a recommendation for that course code. The second option was to ‘find close matches only’ – this only recommended a course code to you if the number of ISBNs you matched was at least 1% of the total ISBNs related to that course code in the usage data. I decided I would generalise this a bit, so you can now specify the percentage of overlap you are looking for (although experience suggests that this is going to be low based on the current data – perhaps less than 1)
So, the details are:
Application URL:
http://www.meanboyfriend.com/readtolearn/studysuggest
GET Parameters:
match
Values: ‘All’ or a number between 0 and 100 (must be >0)
Definition: Percentage overlap between ISBNs in submitted list related to a course code, and total ISBNs related to the course code that will constitute a ‘recommendation’. ‘All’ will retrieve all courses where at least one ISBN has been matched.
isbns
Values: a comma separated list of 10 or 13 digit ISBNs
url
Values: a url-encoded url (include ‘http etc.’) of a page/feed which include ISBNs. ISBNs will be extracted using a regular expression. (See http://www.blooberry.com/indexdot/html/topics/urlencoding.htm for information on URL encoding)
If both isbn and url parameters are submitted, all ISBNs from the list and the specified webpage will be used.
Example:
An example request to the script could be:
http://www.meanboyfriend.com/readtolearn/studysuggest?match=0.5&isbns=0722177755,0722177763,0552770884,043999358,0070185662,0003271323,0003271331,0003272788
Response:
The response is xml with the following structure (this is an example with a single course code):
<study_recommendations> | |
<course type=”ucas” code=”V1X1″ ignore=”No” total_related=”385″ your_related=”3″> | |
<items> | |
<item isbn=”0003271331″></item> | |
<item isbn=”0003271323″></item> | |
<item isbn=”0003272788″></item> | |
</items> | |
<catalog> | |
<provider> | |
<identifier>S84</identifier> | |
<title>University of Sunderland</title> | |
<url>http://www.ucas.com/students/choosingcourses/choosinguni/instguide/s/s84</url> | |
<course> | |
<identifier>997677</identifier> | |
<title>History with TESOL</title> | |
<url>http://search.ucas.com/cgi-bin/hsrun/search/search/StateId/Dhh-QG8Bhe33Egpbb227I8OPTGQUw-VTyY/HAHTpage/search.HsDetails.run?n=997677</url> | |
</course> | |
</provider> | |
<provider> | |
<identifier>H36</identifier> | |
<title>University of Hertfordshire</title> | |
<url>http://www.ucas.com/students/choosingcourses/choosinguni/instguide/h/h36</url> | |
<course> | |
<identifier>971629</identifier> | |
<title>History with English Language Teaching (ELT)</title> | |
<url>http://search.ucas.com/cgi-bin/hsrun/search/search/StateId/Dhh-QG8Bhe33Egpbb227I8OPTGQUw-VTyY/HAHTpage/search.HsDetails.run?n=971629</url> | |
</course> | |
</provider> | |
</catalog> | |
</course> | |
</study_recommendations> |
The ‘catalog’ element essentially copies the data structure from XCRI-CAP which I’ve documented in my previous post – I’m not using this namespace at the moment, but I may come back to this when I have time. The ‘course’ and ‘provider’ element can both be repeated.
If you are interested in using it please do, and drop me a comment here if you have examples, or suggestions for further improvements.