Google Scholar – SWOT analysis

A really interesting conclusion to the talk from Marco Streefkerk – his SWOT analysis of Google Scholar, and some questions for the SFX/MetaLib and wider library communities:

Finally Marco is summing up his feelings about Google Scholar:

Strengths

  • Googles reputation and familarity
  • Googles speed and user friendliness
  • Relevance ranking based on citations
  • Extra services
  • Multi-disciplinarity

Weaknesses

  • Heterogeneousness of the material
  • Content is arbitrary
  • Risk of dead-ends (users find the citation, but can’t access the full text)
  • No expert search
  • Anglo-Saxon (English language) focus

Opportunities

  • Offer an easy starting point
  • Reach new user groups
  • Reach new content
  • Easy and exper search fully integrated using OpenURL
  • Higher usage of valuable (expensive) content

Threats

  • Lost sight (control) of indexing policy
  • Possibility of censorship
  • Users get lost/confused
  • User ends up at wrong copy (i.e. doesn’t get to the institutional subscription)
  • Print collection becom less visible
  • Information skills will disappear
  • Library services become less visible

Questions

  • Do we want to give full-text prominence
  • Can we consider GS as a normal resource
  • Is GS a better service than Google when looking for a public domain copies
  • Is GS better than citation linker or catalogue for tracking known items
  • Is GS better than MetaLib’s QuickSearch
  • What’s the P R of GS compared to MetaSearch and native searches
  • What’s the commercial drive for GS

Google Scholar and Digital Library

The last day of the conference, and Marco Streefkerk from the Netherlands is introducing a discussion about the relationship between Google Scholar and Digital Libraries. The question for users of SFX has been particularly discussed following the introduction of a tool by Ex Libris which allows libraries to register with Google Scholar, indicate to which items they have available in full-text, which in turn allows end users to choose to see links to their institutional OpenURL resolver in the Google Scholar results.

This sharing of data with Google has raised a wide variety of concerns – giving information to Google for ‘free’, the need to keep Google up-to-date with holdings, the question of full-text linking versus general linking. These concerns run alongside the more general concerns about Google and it’s impact on the library world.

At the US SFX/Metalib User Group earlier this year, Anurag Acharya, the principal engineer for Google Scholar presented on Google Scholar and Libraries. Unfortunately we don’t have a representative from Google here today, I hope the discussion is going to be interesting nonetheless.

One of the main issues seems the amount of ‘unknown’ stuff about Google Scholar – I’m not going to go into all this as it has been well covered elsewhere. However, one interesting point made by Anurag in his presentation in the US was that by harvesting library holdings and including OpenURLs, Google is including ‘uncrawled’ URLs for the first time ever, and by offering links to institutional resolvers in it’s results is also a break from Google’s usual practice.

I think our users worry much less about ‘unknowns’ than us. One theme of this week (for me) has been our (professional) over-eagerness to expose all information to our users – sometimes ignorance is bliss. We can clearly see ourselves, as a profession, protecting users (and perhaps more grandly, society in general) from knowledge being limited by specific providers or interest groups, but this may not be what our users want us to spend (all) our time doing.

Library system interfaces – designing for users

Through a talk but by David Walker from Cal State, San Marcos, he is exploring many concepts about user centered design for library systems. He is actually talking about how he has used the MetaLib ‘X-Server’ which is a set of XML APIs to develop a new interface to the MetaLib application – but many of the issues are about design, not the API.

It’s an interesting area as there has been a lot of debate about the appropriate interface to MetaLib as a federated search engine and library gateway. One of the problems we face is presenting the complexity of the MetaLib application in a way that the users will relate to. Perhaps part of the challenge is that we (as a group) are unsure how much to expose the real complexity. We perhaps have a natural inclination to tell our users how complex it all really is – while our users are clearly not very keen on understanding this.

Some nice aspects to Xerxes (the interface David has built on top of the MetaLib application) – splitting your search targets by labels such as ‘Books and Media’, then rather than pointing at specific collections, breaking down into speed of availability – ‘in our library’, ‘1-3 day deliver’ … These are some excellent ideas – we really need to learn about how our users approach their work, and start delivering interfaces which relate to this.

Some clear indications that ‘availability’ is a key issue. I have conflicting feelings about this. I can see why users want to know if something is available full-text or not – and I would myself – but I also feel that this is a severely limiting approach for serious search. It worries me (professionally – it doesn’t keep me awake at night!) that people will simply not use any references where we don’t have the electronic full-text.

Some limitations to what can be achieved by Xerxes, as not all the functionality of MetaLib is yet in the XML API – although coming this year.

Interestingly, many of the things that David is talking about would be possible in the native MetaLib interface I think. There is as much about the terminology, and translation of data from the metadata record as it is about the use of the xml api. There are some display issues that you need to have real flexibility to acheive – however, we can go along way towards it, without the extra resource needed to develop a whole new API. It is also interesting how David has harnessed the power of other applications (e.g. Google Maps) to enhance the user experience. If you know which library a book is in, why not show it on a Google Map – then you have links to directions etc as well.

Just finishing up, David is mentioning his work on ‘rss creator’. This uses the power of a federated search engine, and an openurl resolver (in this case MetaLib and SFX) to create a table of contents alerting service.

The presentation is meant to be at http://public.csusm.edu/dwalker – but not there yet. However, there are some details of the rss creator at this address.

Ex Libris – SFX – ScholarSFX®

[Updated 06/11/2012. In light of a comment left below, I should clarify that this post was written to record an announcement by Ex Libris made at the ICAU/SMUG (International Consortium of Aleph Users/SFX and Metalib Usergroup – now merged as IGELU – the International Group of Ex Libris Users) meeting at the British Library in London in 2005. The text below was from the Ex Libris website at the time, and may no longer be valid]

Ex Libris – SFX – ScholarSFX®

Google Scholar and SFX®: New Opportunities for Libraries and Researchers

Ex Libris has worked with Google and a number of SFX customers to ensure that Google Scholar search results display OpenURL links to SFX; and thereby to the scholarly peer-reviewed papers, theses, books, preprints and technical reports from your library collection.

If you are not yet an SFX customer, Ex Libris is pleased to offer a free service, ScholarSFX. This groundbreaking service enables libraries to create customized links based on your institution’s electronic journal holdings and display these links in Google Scholar search results. Your users will then be able to link from the Google Scholar results to articles that are available through local institutional subscriptions or for free on the Web. ScholarSFX includes links to thousands of such free journals.

ScholarSFX offers a simple web-based Wizard to guide you through the setup procedure whereby you configure ScholarSFX to represent your institution’s holdings.

To sign up to receive the free ScholarSFX service from Ex Libris, please fill out the form located on this page.

RFID tags

Some mundane but interesting points from the final speaker in this section.

When you are buying tags you need to think about lots of little things – the printing, the adhesive being used (water content and acidity/alkalinity). The method that the chip is connected to the antenna – if you use glue, you can find that the chip becomes loose when the chip gets hot – the alternative methods are ‘flip’ and ‘cross over’ – the latter is the ‘best’, with higher temperature resistance, and faster reading, and more reliable connections.

Copper or Aluminium antenna.

You need to decide which standard you use – ISO 15693 or ISO 18000-3 (he recommends the former as better for library use). You need to get the correct version of the chip (he suggests Phillips SLI), you need to think about the ‘capacity’ of the chip – he recommends at least 32 bit, and ideally 64 bit.

This is bringing home to me the fact that we are still in very early days – I knew that there were competing standards, but there are still a lot of problems, and there seems a likelihood that any choice at the moment will mean being tied to a single vendor. This may not be a problem, but something to bear in mind.

I’m finding it hard to get very excited about all these details, but there is no doubt that the speaker is thinks they are very important – and I’m sure he is absolutely right. You don’t want to make the wrong decision on a large scale in this area. However, generally the speaker is saying – don’t skimp on the quality of the tag to save on price – you get what you pay for.

RFID – little problems

Interestingly the guy from Denamrk is saying that they have found that for CDs and DVDs they cannot read the tags if the CDs/DVDs are directly on top of each other (if the space between the tags is too small). To tag individual discs in a single case (a lot of DVDs come like this), they have to put them in cases that don’t ‘stack’ the discs on top of each other – what a pain!

Also some interesting little problems. Some materials they have found that the tags cannot be read through the outside covers – presumably metal content or something in covers. However, these problem items are in the minority.

Again some of the problems they have had is the fact that the library software (not Aleph in this instance) doesn’t interact with the RFID mechanisms – and they have had to write some special software to make this link.

Originally next to their self issue machines they had notices advising to put 5 items on the platform at one time. However, they found that people would keep checking and rechecking that they were doing 5 items. So they changed this advice to 3 items, and people were happier, as they can easily see that they have 3 items on the platform. Funny how these small things make such a difference.

Also they found originally that their self-issue readers were too powerful, and would read information from books on nearby trolleys etc. They have now reduced the power, and put some metal sheilding around the viewer to prevent this.

RFID

Today is the last day of the Aleph (Library Management System) part of the conference. Tomorrow and Friday is dedicated to the SFX and MetaLib products.

Anyway, the first session today is about RFID. Firstly Jo Rademakers from K.U.Leuven is speaking about their experience using RIFD. The driver seems to have been coping with longer opening hours on limited staff.

Currently they are using RFID for issue and security for items (although not all journal issues are tagged). They haven’t yet put in return machines, but they are doing that this week.

I’m really excited about the possibilities of RFID, although it is more impressive in the flesh (as it were), than listening to talks about it. However, it has the ability to deliver a great self-service user experience in the good old physical library.

One issue that they came across in K.U.Leuven was that there are no links between the RFID activation equipment and the library system. This means that careful workflows need to be constructed. I think that LMS vendors need to catchup here. Although the user side works well (making use of the SIP2 protocol), the library staff side seems to be less well thought through.

Now someone from Denmark is talking about their aims with RFID – in the future, they are thinking of is:

Replacement of materials by robot
Internet addresses encoded in the tag which can take you to more information when scanned
Intelligent shelf – self sorting, shelf lists, warnings when an item is mishelved, automated shelf indicators at the end of shelving bays.
Gate based lending and returning of materials – so that items are automatically issued when you leave the library with it.
‘Fairytale cave’ – for children. An area of the library for children, and when they take a book into the area the tag is read, and illustrations of the story are shown, or an audio version of the book is played!

However – these possibilities are not necessarily cheap. Also, the key use for now is self-issue. Within a couple of months of implementing they had gone to 90% loans done by self-issue – that is pretty impressive.

They are taking advantage of the reduction in routine work to start staff development and qualifications.

Some questions from me – is anyone using RFID to do batch processing of books? I’m not talking about batch issue and return, but things like batch status changes (change all these items to short loan), batch arrivals (box of books arrives from supplier and all ‘arrived’ on system automatically)? Also, is there any use of RFID for non-circulation/stock management issues – e.g. acquisitions, serials check-in?

Conference sessions today

I’ve not posted a lot today because all of the sessions this morning and the first one this afternoon cover a lot of information that might be regarded commercially sensitive. However, I’ve been very excited by the direction the company are talking about, with a move to tackle many of the issues we’ve been talking about over the last few years – RSS, rethinking the OPAC and the meaning of search/retrieval for library collections, simplification of system management.

I’m finally seeing being able to deliver a user experience that we can be proud of!

FAST driven OPAC, and digitisation as a method of metadata enrichment

The Wi-Fi seems to have recovered from yesterdays problems, so back online for the second day of the conference.

Yesterday afternoon I saw a couple of very interesting poster sessions:

From Germany, using the FAST Data Search engine to create an alternative search and retrieve interface for their 11 million bib records (to be expanded by adding other similar sized collections). I was struck with the user experience – much more like Google of course – but dispensing with the idea of the ‘OPAC’ – perhaps a concept that is well past it’s sell-by date? The other thing that I was really impressed by was the ability to do ‘drill down’ into the data set retrieved by various facets, including authors and subject headings. The speed at which it was able to create this faceted browsing, on the fly, for large datasets (e.g. 16,000 records retreived in 0.42 seconds, with facets ready to use). The other thing to note here is that FAST is a well established tool which is out there, and ready to use. Let’s stop trying to develop library based search, and use these excellent tools that already exist. An ideal candidate for this approach would be COPAC – the interface is not brilliant and the data set it large enough to benefit.

The second session was from Austria. The session itself was showing how they were integrating digitised material with their catalogue using open source software (notably swish-e, an open source search and retrieval tool, not unlike FAST above). However, what I was more interested in was that this demonstration suddenly brought home to me the possiblity of scanning in TOC from books to enhance the metadata available to the searcher. So far we have thought about digitising items to make them available electronically. However, scanning ToC to make them more retrievable wasn’t something I’d considered. If we were to scan and OCR all contents pages from our teaching collection, I think we would see a real benefit for the users (we regularly have examples of people failing to find material because they are searching for the chapter title or author rather than the monograph details. Of course, using ToC is not a new idea, I just hadn’t thought of the use of digisation as a way of obtaining this data.

Web Services – part 2

Omri Gerson from Ex Libris is now talking about web services and the Ex Libris approach. To start with he is covering defining Web services, covering some of the ground that Mark already covered.

Omri is more taking the approach that we need to see standards developed for Ex Libris to adhere to. So – once you have ZING, you can support the relevant web services.

Currently, where these standards don’t exist, the only option is to do ad hoc services (although perhaps the other option is to define standards?)

For Aleph 18.01, Ex L will implement a SOAP wrapper on the x-server and describe all services using WSDL – hooray!. They are also comitting to not changing the structure or scope of the existing services – also hooray (had problems with this in the last upgrade!)

There is also the intention to establish an x-server focus group to help develop x-services and determine their scope.