EPrints v3 preview

At a session today about the new version of the EPrints software – due for launch at the end of January 2007 (in San Antonio). We’ve been using EPrints at RHUL for 2 years as part of the SHERPA-LEAP project. We currently use EPrints v2, which is hosted for us by UCL.

We are currently looking for a piece of repository software which can be used for institutional research output (which is how we currently use EPrints v2), but also for materials related to teaching and learning – so we are interested in a product that can cope with the following:

Research papers
Theses
Digitised readings (course based reading specifically)
Digital Images
Digital Video
Digital Audio

The software needs to support workflow, and copyright/IPR issues. It needs to act as a digital library, an open access research repository, and a respository for learning material (underlying our Moodle VLE).

Since getting to the meeting, I’ve already been challenged as to why we want to implement a single repository in this way – why not plugin to web based services such as YouTube or Flickr etc.

This challenge reminds me a bit of a posting on eFoundations recently that considered Flickr and pondered what a Web 2.0 repository would look like – and surely YouTube and Flickr are examples of Web 2.0 respoitories?

Les Carr is emphasising that EPrints in not just an upgrade, but a complete redevelopment of the systm – with more functionaliyt, more flexibility, more interoperatility, and a move to put current ‘admin’ functions into the hands of ‘users’.

Some immediately noticeable facilities with v3 (we are now into the demo), is the links to Atom and RSS (1.0 and 2.0) feeds. The front-end looks pretty much like v2 – which is a bit of a shame, as the basic interface has always felt a bit clunky to me (apologies for this, but looks like it was designed by programmers, not designers). I guess this is something we can configure – and probably would need to put some effort into making it look nicer). Interestingly, appartently they did get a UI specialist to look at the interface (although not clear if this was just for the deposit interface)

The currently supported ‘browse by subject’ and ‘browse by year’ are still there – and probably we could do with investing in adding some extra views (e.g. by course)

Anyway, a few features:

  • Atom and RSS feeds for whole respository, and for every search run
  • A nifty preview for images and pdfs when rolling over thumbnails
  • A ‘request a copy’ function – which allows a searcher to request a copy of an item which isn’t available in full-text. This triggers an email to the owner of the item, and they can approve, or deny, the request. This could be used when you can’t make an item openly available, but can supply copies on request from individuals.
  • Re-ordering search results
  • Export results in various formats – (ascii, BibTeX, EndNote, etc. etc.) (results in these output formats are URL addressable – so could build an interface on this by the sound of it). This is probably actually a bit more interesting than it sounds – it looks like this function gives the ability to view the results in various different interfaces – so, this is possibly (or actually?) the place where you have an API into the search interface – because you can address them in URLs which incorporate the search, and you can view the output in xml or other formats (e.g. Google Maps.

The Registered Users interface has changed quite a bit – with a ‘Manage Deposits’ function, to allow users to manage all their deposits, see which ones are under review, which ones are live, etc. The list of items shown can be filtered and configured by the user. Also new is a History of changes made to an item – which again can be filtered to changes made by a particular user etc.

There is a much wider range of default item types now supported (partly to demonstrate that EPrints is about more than textual content).

The deposit function seems much leaner than in v2. A clear 5 step workflow. Some nice things – like applying a license to the file from a drop down list. Also, you can adjust or add new workflows – this needs a bit more exploration to see how flexible it is.

A nice ‘auto-complete’ feature when filling in author names – taking information from the current authors entered in the repository. Really nice feature when filling in Journal or Publication title – uses Romeo as the authority source, so you can see immediately what status the journal is (Green, Gold, Grey etc.). The authority can be locally held in a file, so can use local sources. The authority files is very nice, but needs a bit more thought – Libraries have been handling this type of question for ages, and I’d like to compare how EPrints authorities works in comparison to some decent library implementations)

Some nice import functions – if you import a CrossRef DOI then metadata completed automatically.

Things that I didn’t see, that I would have liked to see (not to say they aren’t in there, but needs investigation):
Versioning
Technical metadata (and automatic extraction of this on file upload)
Ability to push items to workflow on specific triggers
More flexible and definable workflows – currently ‘workflow’ in eprints essentially defines how screens are presented in the ‘deposit of an eprint’ workflow
Notifications based on specific triggers
Ability to limit access to objects based on attributes related to a user (must be authenticated, must be a member of x institution, must be enrolled on x course)

Overall, there are some really nice features in EPrints v3, and much to be impressed by. Unfortunately, I think it is still very much aimed at ‘research output’, and I’m looking for something that is more engaged with ‘institutional respository’ in a broader sense. However, the development team are definitely interested in talking more about this, and I will do my best to involve them as we work towards selecting a product at RHUL.

SCHoMS

I’m at a session of the SCHoMS (Standing Council of Heads of Media Services) about recording lectures this morning. Aside from some technical problems delaying the start (some amount of schadenfreude seeing AV salesmen struggling with the technology) some interesting presentations – just brief summaries here.

Mediasite

Mediasite is a product from Sonic Foundry (now the only product from Sonic Foundry).

Allows recording, storage, management and delivery of lectures/sessions. Integrates with Crestron panels. Also has API for VLEs or other integration.

Captures all VGA outputs – so video and data. Can put in ‘bookmarks’ to link video to the data display at that time – so can easily jump through presentation to each slide and linked video (but requires manual process during the presentation to sync these together).

Now concentrating on the content management aspects. Especially concentrating on search and retrieval aspects – currently supports search of any text content – including OCRing any text content under visualisers and document cameras. They are expecting to launch phonetic searching in the next few weeks so that any mentions of words in video or audio files are also picked up.

Anycast

Anycast Station is a all-in-one Live Content Producer. IT takes feeds from cameras or data. Can control up to 16 cameras with presets. Can be mixed etc. on the fly, or, in conjunction with hard disks attached to the back of the unit, can be re-mixed in post production.

Looks like a nice piece of kit, but do we have the expertise and staff resource to use it? However, the presenter is talking about kitting out a studio with Anycat HD Cameras, Anycast lighting, and the Anycast station, for 60k-70k – which sounds quite cheap.

Impact Marcom

Uses Windows technology to deliver cost effective solutions. In terms of product, offering something similar to Mediasite above, but arguing that can be done more cost effectively by using Windows media server, and Windows media player.

Essentially Impact Marcom are not selling a product, but rather offering a consultancy package to acheive the same result. It doesn’t look like they have the same kind of content management to offer as media site, but could be a lower budget way of acheiving the recording and streaming – estimates about 10k for equpiment if you don’t already have appropriate servers and encoders, then something in the region of 3 days training.

SCHoMS

I’m at a session of the SCHoMS (Standing Council of Heads of Media Services) about recording lectures this morning. Aside from some technical problems delaying the start (some amount of schadenfreude seeing AV salesmen struggling with the technology) some interesting presentations – just brief summaries here.

Mediasite

Mediasite is a product from Sonic Foundry (now the only product from Sonic Foundry).

Allows recording, storage, management and delivery of lectures/sessions. Integrates with Crestron panels. Also has API for VLEs or other integration.

Captures all VGA outputs – so video and data. Can put in ‘bookmarks’ to link video to the data display at that time – so can easily jump through presentation to each slide and linked video (but requires manual process during the presentation to sync these together).

Now concentrating on the content management aspects. Especially concentrating on search and retrieval aspects – currently supports search of any text content – including OCRing any text content under visualisers and document cameras. They are expecting to launch phonetic searching in the next few weeks so that any mentions of words in video or audio files are also picked up.

Anycast

Anycast Station is a all-in-one Live Content Producer. IT takes feeds from cameras or data. Can control up to 16 cameras with presets. Can be mixed etc. on the fly, or, in conjunction with hard disks attached to the back of the unit, can be re-mixed in post production.

Looks like a nice piece of kit, but do we have the expertise and staff resource to use it? However, the presenter is talking about kitting out a studio with Anycat HD Cameras, Anycast lighting, and the Anycast station, for 60k-70k – which sounds quite cheap.

Impact Marcom

Uses Windows technology to deliver cost effective solutions. In terms of product, offering something similar to Mediasite above, but arguing that can be done more cost effectively by using Windows media server, and Windows media player.

Essentially Impact Marcom are not selling a product, but rather offering a consultancy package to acheive the same result. It doesn’t look like they have the same kind of content management to offer as media site, but could be a lower budget way of acheiving the recording and streaming – estimates about 10k for equpiment if you don’t already have appropriate servers and encoders, then something in the region of 3 days training.

Identity Management and Learning 2.0

Just reading Andy’s post of the same title. I think that you could argue (if you wanted to play Devil’s Advocate, or are particularly partial to arguing) that Athens has actually been a bad thing in that it has been too effective, and actually held back investment in  other (perhaps more institutionally based) authentication/authorisation solutions in the UK. I’ve always wondered why solutions like ezproxy have much higher takeup in the US than in the UK – and Athens is surely the answer?

On the Shib front, although it is clearly where we are going with JISC at the moment, I can’t help but feel that we really ought to be seeing demand driven from somewhere other than library resources. For access to library resources in the UK HE sector, Shib seems like overkill – it certainly goes way beyond anything we need to do in terms of controlling access to this type of resource at the moment.

Shibboleth was originally championed by the Grid computing contingent in JISC, but this seems to have disappeared a bit recently – or I’ve just stopped paying attention. For example the ESP-GRID project http://www.jisc.ac.uk/whatwedo/programmes/programme_middleware/project_espgridjuly04.aspx – this was meant to report in March 2006, but the project website seems empty.

Anyway, based on some work I’m currently involved with which is looking at e-learning across 3 institutions, I can see some potential for Shib – at least in the next few years. Here, you can imagine Shib being used to allow access to relevant resources depending on your role in each organisation. I don’t think that the ‘personal learning’ environment will be realised fully in the next 5 years, so some time yet for federated authentication/authorisation to be of use.

Also, there is a question – will ‘personalised’ mean not hosted? Perhaps HE institutions will be the providers of personalised learning portals (i.e. the environment is personalised, and perhaps transportable, but provided by a single institution) which will allow consumption of relevant material from learning objects etc. across a federation – then something like Shibboleth might make perfect sense.

Just to go back to an earlier comment of Andy’s, that he was worried that blogs might stifle discussion. I started to leave this posting as a comment on the eFoundations blog, but ended up blogging it instead. The problem with this is that its a hell of a lot harder to follow discussion when it stretches across several blogs than when it is focussed on a single blog.

Identity Management and Learning 2.0

Just reading Andy’s post of the same title. I think that you could argue (if you wanted to play Devil’s Advocate, or are particularly partial to arguing) that Athens has actually been a bad thing in that it has been too effective, and actually held back investment in  other (perhaps more institutionally based) authentication/authorisation solutions in the UK. I’ve always wondered why solutions like ezproxy have much higher takeup in the US than in the UK – and Athens is surely the answer?

On the Shib front, although it is clearly where we are going with JISC at the moment, I can’t help but feel that we really ought to be seeing demand driven from somewhere other than library resources. For access to library resources in the UK HE sector, Shib seems like overkill – it certainly goes way beyond anything we need to do in terms of controlling access to this type of resource at the moment.

Shibboleth was originally championed by the Grid computing contingent in JISC, but this seems to have disappeared a bit recently – or I’ve just stopped paying attention. For example the ESP-GRID project http://www.jisc.ac.uk/whatwedo/programmes/programme_middleware/project_espgridjuly04.aspx – this was meant to report in March 2006, but the project website seems empty.

Anyway, based on some work I’m currently involved with which is looking at e-learning across 3 institutions, I can see some potential for Shib – at least in the next few years. Here, you can imagine Shib being used to allow access to relevant resources depending on your role in each organisation. I don’t think that the ‘personal learning’ environment will be realised fully in the next 5 years, so some time yet for federated authentication/authorisation to be of use.

Also, there is a question – will ‘personalised’ mean not hosted? Perhaps HE institutions will be the providers of personalised learning portals (i.e. the environment is personalised, and perhaps transportable, but provided by a single institution) which will allow consumption of relevant material from learning objects etc. across a federation – then something like Shibboleth might make perfect sense.

Just to go back to an earlier comment of Andy’s, that he was worried that blogs might stifle discussion. I started to leave this posting as a comment on the eFoundations blog, but ended up blogging it instead. The problem with this is that its a hell of a lot harder to follow discussion when it stretches across several blogs than when it is focussed on a single blog.

Classifying the catalogue

Lorcan Dempsey has posted on What is the Catalog?, and also refers to is unhappiness at the word ‘Catalogue’ in his recent Ariadne piece.

This is an interesting intersection with the recent presentations I attended at IGeLU on ‘Libraries, OPACs and a changing discovery landsacpe’. Both speakers talked about the fact that the traditional view of the library catalogue as the ‘centre’ of the library users information discovery behaviour was no longer valid in the modern environment. One of the questions in the discussion that followed these talks was ‘What do you mean by the library catalogue?’

The ideas that started to emerge out of these talks was that libraries would need to focus more on ‘local’ or ‘unique’ collections that they had stewardship of, rather than trying to catalogue the whole world (the problem is not building the Alexandrian library, but trying to do it thousands of times over?).

I remember having a discussion of what should and shouldn’t be in the catalogue about 5 years ago with a colleague, in the context of the growing number of electronic resources we were subscribing to. Currently what we refer to as our ‘library catalogue’ (when talking to our users) contains:

  • a record of our physical stock (or at least aims to – there is a fair amount of error here)
  • our e-journal titles (paid for and free, aggregations and individual titles, actually imported on a monthly basis from our SFX installation)
  • some, but not all, e-books we pay for access to (e.g. we don’t load individual MARC records for books in EEBO or ECO, but we do for Oxford Reference; we don’t track books available in aggregated databases such as Business Source Premier; we don’t load Project Gutenberg details)
  • some digital objects (online exam papers where available)

This odd mixture has some logic behind it (I won’t go into it here, but we do actually discuss this stuff and make decisions about what goes on in a very general way, if not for specific items), but it seems inevitable that there is no obvious consistency for the library user about what they should or should not expect to find if they search the catalogue.

 

So, if the catalogue is not a list of what we have physically, or what we provide access to physically and virtually, what does it become? My guess is that we are heading towards realigning the ‘catalogue’ towards the physical collection – i.e. this is what we have in the building. This is essentially where we started. We can expect our users to start in a wider world of information, and only reach the ‘catalogue’ when they get close to the ‘delivery’ phase.

If this is the case, what will it mean to the development of the catalogue. Definitely integration of inventory information with the wider world – if the user starts with a ‘big picture’ they will want to narrow it down to stuff they can get their hands on pretty quickly (just today I was frustrated in my local library not being able to narrow my search to ‘this branch, on the shelf, only’). Perhaps a focus on finding the item on the shelf – on a recent visit to Seattle, I was impressed how the layout of the non-fiction stock in the library (in a continuous dewey sequence covering several sloping floors – so you can walk continuously from 001 to 999 without any stairs etc.) made it easy to navigate the stock – especially liking the floor tiles with the dewey numbers on them for instant orientation.

This needs more thought, so hopefully I can come back to it in a future post…

Libraries, OPACS and a changing discovery landscape II

Another take on this from Hans Geleijnse (Tilburg University, Netherlands). He is pointing out that Libraries are not only being driven to improve their services, but also do this more efficiently and in a competitive environment.

Users expect fast, mobile, secure, personalised and open access to information – to use a cliche, anytime, anywhere.

A user survey at Tilbug in 2005 of researchers and teaching staff, and saw that 97% used the electronic library services, but 70% still use physical books from the library. The most valued service are e-journals, database, current awareness services, document delivery and ILL.

However, it also showed that users are not familiar with various important electronic resource, and further, users don’t want to be assisted but prefer self service. There is a real paradox here – Those surveyed said that they believed they would search better with help from a librarian, but that they didn’t want this help.

In this environment, the role of the catalogue is declining and changing – and we have watched it happen, but didn’t change the design of our catalogues. Along side this Tilburg have seen almost a doubling of OPAC searches from 2003 to 2005, but not a similar increase in circulation – so what is happening? On the electronic side we see an increase in searching in the electronic environment, but here we see a similar increase in use of ILL and online full-text.

Just a reflection on digitisation efforts – Hans is reminding us that the development of electronic acces to journals took 10 years – and we are just at the start of this digitisation of books – so even if e-books are currently poor in terms of functionality, we must not assume that this will continue to be the case, or that books are ‘special’ and different to journals.

In the world of Open Worldcat, collections can be searched via Yahoo or Google – why have a local catalogue? Perhaps to integrate with circulation, but not many other reasons? Hans suggests that the importance of the traditional local library system will decrease rapidly in the next few years.

Some quite quick skim over several areas now – e-learning (many students spend much time within their ‘VLE’ – and at the moment we are not seeing true integration of library systems – just linking), insitutional respositories, e-science (currently libraries not really involved in the latter, but there is a massive amount of data, and currently not organised, accessible or re-usable).

So – what should libraries be doing? We need to create partnerships with both departments, faculties, users, and also with vendors. On the other hand Library system vendors need to produce products that support the role of libraries in a changing world, and also

Universities are unique in their research and teaching. Libraries should concentrate on supporting these unique selling points and on digitizing their own unique collections. Libraries must cooperate – regionally, nationally and internationally – and outsource. Joint acquisition and outsourcing of library systems will become a realistic option. The choice of library system does not have a large impact on our user as long as it is a reasonable quality – so we should stop being so fussing.

Hans conclusions:
Role of the catalogue is declining, but do not immediately close it down
The time of ‘my library should have its own local library system and its own portal system’ is over
Need for more standarization and integration across domains and application areas
More cooperation at local, national and international level
Outsourcing of library functions becomes a serious option
Added value is in providing user driven, state-of-the-art and tailored service and support to teaching, learning and research.

Libraries, OPACs and a changing discovery landscape I

A series of presentations, starting with Karen Calhoun from Cornell. She is currently referencing ‘Metadata switch’ from E-Scholarship: A LITA Guide by Lorcan Dempsey et al. Quite an interesting way of thinking about material, splitting it between high and low ‘uniqueness’, and low and high ‘stewardship’ (how much libraries ‘look after’ the resources). The bulk of our physical library collection is not unique, and requires a high degree of stewardship, free web resources are also not unique (well, more accurately, are widely available), but require a low degree of stewardship.

Some stats from the OCLC environmental scan, focussing on College Students. Some of these stats are interesting, but I can’t help but think that we are guilty of focussing too much on user perceptions of libraries.

Why should we be suprised that College students show a higher degree of ‘familiarity’ with search engines than with online library resources – this is like saying they are more familiar with WHSmith (a UK highstreet newsagent) than Grant and Cutler (a specialist foreign language bookshop)

I’m not dismissing the OCLC survey at all, but we need to make sure we aren’t unrealistic in our expectations.

Some interesting stats from Cornell, showing how much their users use e-resources vs catalogue searching – which indicate that e-resources make up 10% of the collection, using 25% of the budgets, and get 50% of the library use. I almost think this provokes the opposite question to what Karen seems to be suggesting – she is comparing low e-resource searching to high search engine use (although she is using total Google searches – 441 million – which is a pointless comparison in my view), I’d say that this suggests that we need to look at navigation of our physical collection – or get rid of it.

Some more interesting outcomes from the University of Minnesota, showing that Humaities and social sciences faculty and grad students work from home, and that there is a ‘market’ for more help from libraries in maintaining ‘local’ (almost personal) collections belonging to faculty or similar.

Overall, we are seeing more use of the catalogue by Graduate students and Faculty, compared to Undergraduates.

Karen is suggesting that the ‘traditional’ model for providing library services is just not meeting the needs of the users.

Karen recently did a report for the Library of Congress, http://www.loc.gov/catdir/calhoun-report-final.pdf. This says that the catalogue is no longer the centre of research, hasn’t kept up with changes in user expectations, and modern retreival systems, and that the economics of traditional cataloguing no longer make sense. Apparently a real division in the community about this – IT staff and Managers welcoming the report, and others feeling very threatened by it. I guess I definitely fall into the former category being both IT and Management – but all this seems a given to me now, and not at all threatening – we need to stop obsessing about libraries as organisations, and think about them as a service – I don’t care how people get to the information they need – as long as they do get to the information they need.

I also wonder if one reaction to information overload by users has been to take a pragmatic ‘good enough’ approach, rather than aiming for complete retrieval or ‘perfect’ searches.

Some stuff about Outreach now – suggesting we need to get out from behind the desk (surely we know this by now?), but also push the resources that we manage into the environment that our users work – so the open web, course management, institutional portals…

Karen says we should be thinking of linking systems rather than building, and decoupling discovery and ‘inventory management’ systems.

The challenge for a library such as the one I work at (http://www.rhul.ac.uk/information-services/library) is that we may not be able to afford appropriate discovery systems, and so perhaps need to essentially out-source this effort – if Google or someone else can provide the discovery tools, that’s fine, as long as we can link into this (e.g. by the OpenURL).

The longer term vision Karen outlines is:

Switch users from where they find things to libraryr managed collections of all kings
Local catalog one link in a chain of services, one respository managed by the library
More coherent and comprehensive scholarly information systems, perhaps by discipline
Infastructure to permit global discovery and delivery of information among open, looslely coupled systems
Critical mass of digitized publications
[missed one point here]

So, I agree with Karen’s point – that ‘discovery’ will take place on the open web, and libraries should focus on delivery, linking into the discovery tools that are ‘out there’.

However, in the medium term, Karen sees the need for better library interface for a better user experience, drawing on the local catalogue’s strongest suit – which is support for inventory control and delivery; shared online catalogues – beginning to aggregate discovery; larger scale collaboration on collection development/resource shareing and storage/preservation.

Also, starting to build bigger scholarly information environments, with libraries playing a role using their skills in Metadata and organisation, but providing these skills to scholars – not doing it for them.

Karen sees the beginning of the era of ‘special collections’ – that is libraries promoting their local ‘high unique’ and ‘high stewardship’ collections, along side the aggregation of discovery of digital collections.

A very interesting talk, and I agree with Karen’s overall vision. I’m slighly concerned that the ‘intermediary’ stage is here, now, and not only are we (libraries) not keeping up, but that this stage is extremely frustrating for the user – they start in the open web, and find material they end up not being able to access – and until a utopian vision of all materials available (freely? – at point of use anyway) online, this will continue to be an issue.

Enriching the OPAC

A presentation from Ian Pattenden from Bowker about their ‘Syndetic Solutions’ OPAC enrichment product.

Obviously really a sales pitch, but I find this an interesting area – do cover shots ‘enhance’ an academic library catalogue, or are they superfluous, and a distraction from the real business of looking for library material?

Syndetic Solutions provides 17 ‘data elements’ for OPAC enrichment – cover shots, tables of contents, sample chapters, author notes, reviews, etc. (see below) and the currently cover 2.8 million books, and growing.

One issue I have with this service is that it is hosted, so the information doesn’t get integrated into your catalogue. Ian is presenting this as ‘simple’, but for me it is a wasted opportunity. If we are going to add ToC to our catalogue, I’d like them to be searchable.

The links are done by ISBN, which means that it doesn’t apply to any of our Audio-Visual material of course, as well as items without ISBNs.

The data elements available are:
Cover images (1.7 million, increasing by 10k per week)
First chapters and Excerpts (from books published from 2000 onwards)
Tables of Contents (about 800,000, rising to 1 million by the end of 2007)
Summaries and Annotations (taken from book jackets and publishers)
Book Reviews and Author notes
Awards (e.g. Pulitzer Prize etc.)
Series (e.g. fiction series in reading order)
Profiles/Search for similar titles (e.g. based on the fact a book is about a single mother detective, you can find other books about single mother detectives)

Some examples
http://aleph.aub.auc.dk
http://nuin.napier.ac.uk

More detail is available at http://www.syndetics.com, and Bowker can do you a ISBN match report before you buy the service (they find match rates vary between 30% and 60%). Price is based on the number of English Language books for academic libraries, or annual circulation for other libraries

Finally, free trial access is also possible (apparently there is an Ex Libris KB item – KB6850 – on how to set this up).

MetaLib/SFX – usability and organizational impact

A study from Jonkoping University.

From the library perspective MetaLib/SFX offers a single point of entry for multiple library resources, and MetaLib/SFX are updated centrally, which is much easier than local libraries trying to keep track of the all available resources.

The study was based on a series of workshops with library staff, and how they expected it to affect their work, and with end-users of MetaLib.

The study was qualitative, but it seemed to indicate that MetaLib resulted in better exposure of all available resources, but the end-users find it difficult to interact with MetaLib.