Recently Chris Keene (University of Sussex) sent an email to the LIS-E-RESOURCES email list about the fact that in academic libraries we are now doing a lot more ‘import’ and ‘export’ of records in our library management systems – bringing in bibliographic records from a variety of sources like book vendors/suppliers, e-resource systems, institutional repositories. He was looking for some shared experience and how other sites coped.
One of the responses mentioned the new ‘next generation’ search systems that some libraries have invested in, and Chris said:
“Next gen catalogues are – I think – certainly part of the solution, but only when you just want to make the records available via your local web interface.”
One of the points he made was that the University of Sussex provides records from their library management system to others to allow Union catalogues to be built – e.g. InforM25, COPAC, Suncat.
I sympathise with Chris, but I can’t help but think this is the point at which we have to start doing things a bit differently – so I wrote a response to the list, but thought that I’d blog a version of it as well:
I agree that library systems could usefully support much better bulk processing tools (although there are some good external tools like MarcEdit of course – and, scripting/programming tools (e.g. the MARC Perl module) if you have people who can programme them. However, I'd suggest that we need to change the way with think about recording and distributing information about our resources, especially in the light of investment in separate 'search' products such as Aquabrowser, Primo, Encore, Endeca, &c. &c.
If we consider the whole workflow here, it seems to me that as soon as you have a separate search interface the role of the 'library system' needs to be questioned – what are you using it for, and why? I'm not sure funnelling resources into it so they can then be exported to another system is really very sensible (although I absolutely understand why you end up doing it).
I think that once you are pushing stuff into Aquabrowser (taking Sussex as an example) there is little point in also pushing them into the catalogue – what extra value does this add? For books (print or electronic) you may continue to order them via the library system – but you only need an order record in there, not anything more substantial – you can put the 'substantial' record into Aquabrowser. The library system web interface will still handle item level information and actions (reservations/holds etc.) – but again, you don't need a substantial bib record for these to work – the user has done the 'searching' in the search system.
For the ejournals you could push directly from SFX into Aquabrowser – why push via the library system? Similarly for repositories – it really is just creating work to covert these into MARC (probably from DC) to get them into your library system, to then export for Aquabrowser (which seems to speak OAI anyway).
One of your issues is that you still need to put stuff into your library system, as this feeds other places – for example at Imperial we send our records to CURL/COPAC as well as other places – but this is a poor argument going forward – how long before we see COPAC change the way it works to take advantage of different search technology (MIMAS have just licensed the Autonomy search product …). Anyway – we need to work with those consuming our records to work out more sensible solutions in the current environment.
I'd suggest what we really need to think about is a common 'publication' platform – a way of all of our systems outputting records in a way that can then be easily accessed by a variety of search products – whether our own local ones, remote union ones, or even ones run by individual users. I'd go further and argue that platform already exists – it is the web! If each of your systems published each record as a 'web page' (either containing structured data, or even serving an alternative version of the record depending on whether a human or machine is asking for the resource – as described in Cool URIs), then other systems could consume this to build search indexes – and you've always got Google of course… I note that Aquabrowser supports web crawling – could it cope with some extra structured data in the web pages (e.g. RDFa)?
I have to admit that I may be over estimating how simple this would be – but it definitely seems to me this is the way to go – we need to adapt our systems to work with the web, and we need to start now.
Hi Owen
I’ve now put my original email online for reference and added some additional thoughts as well:
http://www.nostuff.org/words/2009/library_catalogues_changing_model/
Short url
http://is.gd/jhqo
For us, we started trying to import e-journal records before we had Aquabrowser. We first tried the quick approach, take a huge file of e-journal MARC records and import them in to our LMS (i.e SFX -> Talis). However, problems appeared (duplicates, links not working, odd display, missing journals) which was a combination of the two systems and our inexperience with the process. So we started again, much slower, and most the problems went away.
We were so focussed on this, that we didn’t really stop when we got Aquabrowser to think about different models. But like you say, bypassing the catalogue has advantages and is in many ways preferable. It’s a cleaner solution and removes the issues of keeping the data in sync.
When describing the new issues that are arising of the pros and cons of where the records of particular types of items should live, I used COPAC/World-cat as an example of one issue, i.e. third party systems that expect the LMS catalogue to be all our holdings. These were just examples, and I want to stress the point there are many places where we and others refer to ‘the catalogue’ (and by which we mean the source of all our items bib data) where in future we might have to consider what exactly we mean. Other examples include Endnote (which can search library ‘catalogies’) and Link Resolvers, which can use the catalogue as a ‘source’ (i.e. a final destination where you can find the item your are after). Plus in this new open world, there may well be services using our LMS catalogue (perhaps via Z39.50) which we don’t even know about.
If we change what we do and do not put in the catalogue how will it affect these services?
To be clear, I’ not saying this is a bad thing, in many cases it will probably be a good thing (many third party systems probably don’t want to include online content that only your users have access to), but just something we need to consider.
Anyway, some good points here, and I’ll certainly be giving them some thought (once I’ve had lots of coffee!). Thanks for the ideas. I especially like the concept of a common ‘publication’ platform.
Cheers
Chris
Owen, I think this is spot on! Library catalogues need to be part of the Web – as do repositories which seem to get more discussion space, just library catalogues have much more in them and could be exploited now on the Web.
I think one needs to consider Web friendly models for creating catalogue records as well as for storing them and aggregating holdings etc
This brings to mind recent discussion about changes to WorldCat policies on sharing records derived from WorldCat, as well as discussions on aligning repositories with the Web.
Owen,
For me, I’d read XML rather than ‘the web’ as a common publication platform – a commonly agreed data structure defined in XML would avoid all those problems you have when your web pages get full of tags for different scripting languages, and from the different descriptors that libraries use for data elements. Then you combine this with web services for data interchange (avoiding the latency issues with daily data exports to Aquabrowser) and suddenly it no longer matters where the data is held.
However all this demands real advocacy to the system suppliers who are currently deriving benefits from having us caught up in technical islands. There’s only so much development that the few of us can do, and many libraries are now operating without someone on the staff who can see the need for change. Along with a substantial change in business perspective there’s a real market opportunity for the first suppliers to realise these opportunities. Only then will our users get that single search box that they are looking for……
Hi Ian,
You raise so many things here that I need another few posts to respond! Here are some responses quickly that need more exploration. I think this all makes sense – but I’m also prepared to admit that I sometimes have slightly odd views 🙂
XML isn’t a platform in the sense I mean here. Libraries have more than one data structure available to them that can be expressed in XML, but by itself this is not sufficient – although I’m a fan of moving from MARC or indeed MARCXML to something that is a bit more useful and consumable. (Note that there are approaches to allowing both human and machine-readable representations of data at the same URI – the Cool URIs document I reference in the post describes this, and I also mention the idea of RDFa to embed structured data in an html page giving the user a human-readable display, while including a lot of structured data that can be exploited by software)
The notion of what I mean by the ‘web’ as a common publication platform could do with some expansion here, but there are some fundamental concepts – like the ability to link between documents, using http, expressing information as human readable html (as well as structured data) that I think are a minimum requirement for libraries to exploit the web properly.
I’m very nervous about a ‘commonly agreed data structure’ – I guess this is controversial for libraries, but I simply don’t believe that this is achievable, because I don’t believe the generation of useful (meta)data is restricted to libraries. We can look at how successful (for example) the uptake of even simple DC has been – how many web pages have decent structured metadata embedded in them? I’m not against the idea of some level of commonality – and in some scenarios we may even achieve a useful degree of consistency (for example in libraries – although if you look at data consistency across libraries it is relatively poor even with a lot of commonly agreed rules).
I’m not in any way against communities having common standards – I just think we need to assume a lack of consistency as a starting point. This is where I come back to the strength of the web as a platform – if we look at the web, it is an incredibly inconsistent hodge-podge of information, and yet because of the links between documents it is possible to make some kind of sense of it – which is why Google and others can return relevant hits (I’m not saying this approach is perfect, but it is very successful and definitely serves a need).
I have a feeling I need to argue this out in another post – but see my previous post ‘The Future is Analogue’ to see how I think links are key to thinking about data, metadata and information discovery. I also believe that networks of information naturally minimise (without eradicating) duplication. At the level of data structures, I believe we need to embrace the complexity in the system, rather than try to design it out – and in the end a successful approach to this would allow people (users, librarians, search engines) to bring together information that serves them or their community best.
I think (hope?) that I’m arguing for something more radical than you suggest – and yes, we need people to develop this with us, although it may not be traditional library system suppliers – although I’d say that out of the ones I have worked with both Talis and Ex Libris show some understanding of the need for change here.