Mashed library – it’s happening

OK – following the overwhelming response (ok, a few people, but all very keen) who responded to my recent post on a kind of unconference/hackfest/barcamp thing around library technologies, I’m going ahead with it.

It’s early days yet – no venue, no date etc. – although it looks like I’ve already had an offer of some possible funding, and a couple of tips on venues – so capitalising on my own enthusiasm, and that of others, I’m determined this is going to happen. Although the event will be about using technologies in libraries, we need a mix of non-techy (but interested) people, as well as any programmers or part-time coders/messers (where I’d count myself).

I’ve setup a website at http://mashedlibrary.ning.com/ with some ideas from me of how the day might work, and asking for input from you (dear reader) to say how you think it might work, and anything else you want to contribute.

Technorati Tags:

Wordle for ALA Annual Conference 2008

Lots of “Wordles” springing up around the place at the moment, so I thought I’d throw something together based on tweets mentioning ala2008 or ala08 (both used as hashtags) during the conference week.

I set up a search on summize, and then pasted the results a page at a time into a text editor. I did clean up the file a bit – the pasted text had some supporting data with it that wasn’t directly part of the tweet – for example the date, resulting in ‘Jun’ being the most common word. Some of this might have been interesting to leave in (e.g. leaving in AM and PM, which I removed, would have shown when people were more active) – but my Wordle, so I decide. Here it is:

Ala-wordle

Mashed Libraries? Would you be interested?

I’ve been at ALA for the last few days, and as usual, time out of the office has triggered some ideas. The one I’m most excited about is that I think we could do with having an event in the UK that is about bringing together interested people and doing interesting stuff with libraries and technology.

The kind of thing I’m thinking of is along the lines of the recent ‘Mashed Museums’ event which Mike Ellis blogged about at http://electronicmuseum.org.uk/2008/06/27/mashed-museum-2008/

I’m also inspired by the annual ‘Hackfest’ that happens in conjunction with the ‘Access’ conference in Canada – described by Roy Tennant, and the various Barcamp and Unconference events that spring up quite regularly now.

At this stage, I’m looking for some interest and answers – please leave comments below about the idea and if you want, answer the following questions:

  • Is this a good idea?
  • Would you come?
  • Are weekends or workdays better?
  • Would you be worried about not being ‘techie’ enough to participate?

Here’s hoping it’s not just me sitting in a room with my laptop…

Technorati Tags:

ALA 2008: Institutional Repositories: New Roles for Acquisitions – Acquiring Content Adding ETDs to your Digital Repository

This session by Terry Owen ‘DRUM Coordinator’ from the University of Maryland libraries.

Going to show workflows they developed for adding electronic theses to their repository (called DRUM).

Another DSpace implementation – launched in 2004 (with 1100 docs – all theses), 7900+ documents as of June 2008.

They have 20 DSpace ‘Communities’ (I need to look at difference between ‘community’ and ‘collection’ on DSpace)

Sorry – drifted off there…

Generally the Grad Schools who initiate the ETD – the stakeholders for ETD are:

  • Students
  • Faculty Advisors
  • Graduate School
  • Library
  • IT Dept

The s/w options for ETD submission (i.e. the bit the student interacts with):

  • Proquest/BEPRESS
  • ETD-db (Virginia Tech and NDLTD – Networked Digital Library of Theses and Dissertations – recommended for advice)

Running through some benefits of ETD:

  • Can be found, read, used by global audience
  • Increases chances of citation
  • Lower costs (printing and copying)
  • Less hassle for students
  • Educates students on electronic publishing
  • Showcases an institution’s research

Some workflow stuff – need the slides really though. Noting that when students enter data they make lots of mistakes – to titles, even to their own names.

However, only the library catalogue record is checked – then the cataloguers pass the information to DRUM, who make corrections ‘as time allows’ – this is absolute madness!

They provide links from the library catalogue to the DRUM record – either via URL recorded in MARC record, or via OpenURL link resolver (which leads to the question in my mind – why bother having any metadata in DRUM at all – just have it in the library catalogue!)

Some ETD concerns:

  • Will journal publishers still accept my article if it is available electronically
  • What if I want to submit a patent based on my research?
  • What if I want to write a book related to my thesis
  • etc.

So, decided to provide Embargo options:

  • Restrict access for 1 year
  • Restrict for 6 years
  • Restrict indefinitely
    • Requires written approval by the Dean of the Graduate School

However, the print copy is not embargoed – and will be supplied on Inter-library Loan! So just making work for ourselves here!

Why embargo?

  • 1 year – for patent protection on materials, to publish in a journal with restrictive publication policies
  • 6 years – to write a book

DSpace embargo options very limited. Could have created an ‘Open’ and ‘Closed’ collection – but since this doubles the number of collections.

Can control access to items (I think this is exactly what we need for our MSc theses – need to investigate, since I was told it couldn’t be done) – however, it doesn’t work very well from a user experience perspective – asks you to login, then tells you that you can’t access it.

Instead they decided to create a ‘Restricted Access’ option, which explains to the end user. They have automated the process – the grad school pass the embargo information across with the metadata (I think this is right) and automatically applied.

There is a form that the students use – all students fill it out, offers options of ‘immediate access’, ‘1 year embargo’, ‘6 year embargo’, or ‘indefinite embargo’ – has to be signed by faculty advisor, and comes with handout about the embargo, and why you would embargo etc.

So far 474 requests for embargos (since 2006) – represents 31% of submission (note that the first 1 year embargos have now passed, so less that 474 embargoed theses in the system).

Most commonly embargoed (by percentage) are Chemistry and Life Sciences, and Business. More 6 year embargoes from Arts and Humanities – because of book writing.

They see the rate of embargo as high – are planning to do more education about this.

The Grad School committee did not want electronic copies ‘floating around’ – so library is doing all kinds of jumping through hoops to print out and mail theses that are requested on ILL. Looking at possibility of having a non-printable PDF. Also hoping to allow on-campus access to embargoed ETDs.

I think I would have lost patience at this point – lucky I don’t do advocacy 😉

They have some special cases – Copyrighted works -have a ‘redacted’ version in DRUM, and a note is added – a full version is kept in the library (either in print or on CD/DVD etc.) – again what nonsense.

Sorry – it isn’t the DRUM managers fault, I just can’t quite believe the contortions here (although note that the number of theses falling into this last category is small).

In summary:

  • ETDs require regular attention
  • Build a good relationship with the Grad School
  • Important to educate faculty advisors and students about open access issues
  • Be prepared to implement embargoes
  • Link ETDs to library catalog
  • Have plans in place for special cases (copyrighted works)
  • Efficient and capable IT department
Technorati Tags:

ALA 2008: Institutional Repositories: New Roles for Acquisitions – Ohio State University’s repository

Called the ‘Knowledge Bank’ – is DSpace. Content is defined by ‘institution’ – but different types of content:

  • Journals
  • Monographs – e.g. Ohio State University Press
  • Undergraduate Theses (not ETDs which are done in a consortial system)
  • Conference materials/Technical reports/Images/etc.
  • Currently 30k records (started since 2004, but with some significant batch deposits)

Knowledge Bank pushes out to other sources – e.g. ScientificCommons.org, OAISter

They created a Metadata Application Profile for the KnowledgeBank, using core set of metadata elements and DC Metadate Element Set – available from KnowledgeBank website

Question – does the metadata make sense when it is outside the institutional context? Example of a library newsletter – makes sense in the KnowledgeBank because it exists in a hierarchy (not sure, but I guess by collections?), so they didn’t in the firstplace bother replicating this information in the record. However, then when taken out of that hierarchy and put into OAISter (for example)  – without that hierarchy information, it was impossible to tell what it was.

They decided to add the relevant information back into the record (this seems really wasteful – it should have been done surely at a system level – should have been possible to automate the integration of the hierarchy information into the record without having to rekey)

Mentioning problems of authority control – lots of people contributing etc – so many variations in author names, and in subjects etc. They are doing a project to clean this up for a single collection at the moment

Saying that people often add keywords that are already in the title, so they don’t add information (although I’d argue that it does add information – this kind of thing could be used to help relevancy surely?)

They have setup a ‘Community Metadata Application Profile’ – which is shared with all who submit material into the repository. She is showing some details on the slides, but I can’t read it.

They have Customized Item Metadata display at a collection level. Also customize collection display – e.g. for journal collections, they have a ‘Table of Contents’ display which can be browsed from an ‘issue’ record.

They have License Agreement in place, with an optional Creative Commons license – done each time someone submits. When submission is done by a proxy, the individual signs a permission to allow this – which is then attached to the item, though suppressed from public view.

There are customized input forms for submission – again at Collection level. Can also do customized input templates with prepopulated metadata fields for repeated information.

There are Item Submission Workflows – example of the speakers workflow areas – can approve/reject/or push back into the pool of work.

Talking about batch loading of items – using (for e.g.) a spreadsheet to create the data (cols of DC data) – creates an XML file, which then loaded in batch. Using a spreadsheet means no new interface to learn for people not working with the KnowledgeBank everyday. (I’d personally prefer to see a repository that was easy to use, so this wasn’t a problem)

They also repurpose MARC metadata for things that may have already been catalogued in a Library catalogue systems – transforming it into DC and loading into the KnowledgeBank.

Technorati Tags:

ALA 2008: Institutional Repositories: New Roles for Acquisitions

The last session that I’m going to – but really relevant. Unfortunately I’ve missed the first 10 minutes or so. Someone (think it must be Peter Gorman from University of Wisconsin-Madison?) is speaking about their experience of having an institutional repository.

Just mentioned the SWORD API to help deposit workflow. Also mentioning bibapp, and using the SWORD API to push stuff from bibapp to the institutional repository. Also EM-Loader doing something similar.

So, what is the difference between Institutional Repository content and Digital Library content? Users doing (necessarily) care where stuff comes from, or how it gets there, and most the objects, although very varied, have the same fundamental management, preservation and access needs.

This has challenged the assumption underlying their IR infrastructure.

Now showing a ‘scary diagram’ – showing how one central ‘repository’ could take in content, and what services it would need to support.

Some interesting questions remain:

  • What is a collection?
    • Does the material determine it?
    • Does our s/w determine it?
    • Does our workflow determine it?
    • What aggregations are meaningful to our users – and in what contexts?
    • Single repository gives possibility of more flexible aggregations that serve specific contexts (I’d say I’m not sure this depends on the backend storage, but on the access systems, but I think the overall point is a good one)
  • When do we select?
  • What do we catalog? – and why?
  • What’s the role of Archives? Overlap with traditional archives roles – in physical world, well established, need to establish them for the virtual world

No answers to these at the moment…

Moving to a different topic, Copyright:

We (librarians) may have mutliple roles:

  • Deciding what to digitize
  • Determining access rights
  • Negotiating digitization/access rights
  • Advising contributors on copyright and Fair Use
    • Faculty submitters
    • Students (Electronic Theses)
  • And sharing knowledge with others
    • Orphan works

Mentioning OCLC idea of joint work on this, creating a central database on this. Google have released copyright information they have collected this week on works. Hoping that the Google and OCLC efforts can be brought together.

Copyright determination: theses and dissertations

  • Is it published? (according to Copyright law) – speaker thinks ‘yes’ but they are looking into it at the moment and getting legal advice
    • What is the publication date?
    • Is there a copyright notice?
    • Does Fair Use apply?

Mentioning a resource from the Library of Congress ‘circular 22’ – how to investigate the copyright status of a work – noting the first half is scary and seems designed to put you off even starting the process – but skip that and go to the second half which is full of really good advice.

Also, there are flowcharts from places – e.g. from lawyers Bromberg and Sunstein which was used by speakers institution.

Technorati Tags:

ALA 2008: Top Technology Trends

I’ve decided to take a break from cataloging this afternoon and opted for the easier on my brain ‘LITA Top Tech Trends’ session. Ironically this is the first session where I haven’t been able to get online 🙁 hooray – managed to get online.

The panelists are:

  • Karen Coyle
  • Eric Lease Morgan
  • John Blyberg
  • Meredith Farkas
  • Roy Tennant
  • Clifford Lynch
  • Karen Schneider
  • Marshall Breeding

Two online participants as well – one is Karen Coombs, but I didn’t get the other persons name unfortunately the other was Sarah Houghton-Jan

All of these are Interesting how some of the same names keep cropping up – would be nice to spread the speaking goodness around a bit folks!

There is a chat room at http://www.meebo.com/room/toptech/ – might be testing my multi-tasking to the limit!

MB: Open source – already in public, will come into academic. Make sure that systems are actually open, not just called open etc. Also look for open data

KS: Broadband – quite political stuff – no proper telecommunication strategy at federal level I think is the basic message.

Open Access – small literary journals are doing this, because they can in the online world – gets rid of costs. No so visible to librarians, as we tend not to ‘own’ these things

Lots of tech problems with online participants – sound patchy, video not brilliant etc. Good that it’s being tried though!

CL: After enthusiasm for Open Source we will see a backlash as people try to come to a realistic view – is he talking about the ‘hype cycle’?

Collaboration across the network – people need to be able to work together in an agile fashion – sychronous and asynchronous. Report – ‘Beyond Being There’ – looks at the issues around virtual organisations.

Travel will get much more expensive and less common. Virtual presence in meetings needs to get much better much more quickly.

Need to look at how we regard privacy of information

Letting go of ‘holdings’ so they can be reused and put into contexts where they add value outside the normal venues.

Overload?

RT: Suprises are the norm! Google digitisation was a suprise. Hope that OCLC can suprise

We need to retool for constant change.

Need to get over the system upgrade cycle – need to be on the latest platform, and upgrade in timely manner

MF: Role of social s/w in collecting local knowledge.

Library as social hub, providing technology for community – e.g. slide scanner to digitise peoples slides

Combine h/w, s/w and education that people need to do digital projects

Blogs as historical artifacts. If someone is taking a class in 50 years time about library history – will they be able to see the blogs that started and developed the library 2.0 movement?

JB: Green technology – cost and environment concerns

Adding semantic markup to docs – extracting meaning from text

Mobile, always on devices

Personal relationship with information – connections etc.

KC: APIs!

ELM: Got to get your ‘stuff’ out there – a Web API is the way to do this.

KC: Handheld devices

Give up control on data

Didn’t get so much of this – too busy taking part in the discussion in the chat room – oh well…

Technorati Tags:

ALA 2008: Future of cataloging, etc. discussion session

RT: Pointing out LibraryThing doesn’t give away all its data anymore that OCLC do

TS: Only tags are protected

Bit of a ‘librarything’ vs ‘OCLC’ thing going on here – find this a bit petty and dull

MY: Television went free because of advertising – but this changed the nature of what was on TV – just stuff that attracted the advertisers target audience

JB: First thought on seeing librarything user generated links between books – oh my god – what if it is done wrong? But then, realised – you have to view the data in the way it has been added – don’t mistake it for cataloging, but it is really valuable.

TS: LibraryThing still only has binary representation of FRBR relationships – needs to be more sophisticated

DH: Need to capture ‘point of view’ – make the point we have a ‘point of view’ – maybe not as objective as we think – but it is important. We need to allow different points of view – then we can form communities of practice around those people who share our point of view

This is so important – I really think this is key. What we need is ways of being able to express a ‘point of view’ and filter the world to allow us to use the things done by specific people or communities to give us ‘our’ view. I wonder if there is an approach or we need one which would allows a ‘distributed’ wiki approach – where you could overlay changes only made by specific people etc? This is how collaborative cataloging would work in my head – I need to write something on this and explain it more clearly.

RW: Not all going to happen one way – how do we deal with this?

TS: Open data

TS: RDF – just another example of an over-engineered solution – worried many web people don’t believe in it

DH: Important to know about RDF – agree not necessarily ‘the answer’ – no right and wrong, but mix of approaches

I think that we need to at least embrace the concepts of RDF – this is about linked data folks – I don’t care to some extent about mechanics – RDF, Microformats, structured HTML etc.

DH: We shouldn’t spend all our time on secondary products (books and serials) – need to look at primary stuff – the ‘long tail’ of library resources

TS: LibraryThing has better series information than you buy from Bowker. Publishers want their covers out there. There is going to be a lot of information

MY: But if there is value in cataloguing, shouldn’t it be paid for?

DH: Don’t count value by each record created. You can pay for things in different ways. Need to stop thinking about charging for metadata even though it costs to create it. You have to make value further down the chain

This is the same as the idea of the ‘free our data’ campaign by the Guardian – we increase value by giving away information, because it aids the information economy, which grows, and pushes value back into the system. This is counter-intuitive, but the report from Cambridge on this showed the vast amount of value in publicly funded information like Ordinance Survey.

MY: It is difficult to carryout cost/benefit studies in libraries – they usually end up just being ‘cost’ because benefit so difficult to measure. Problem is that we serve an ‘elite’ and difficult for society to see that value

I disagree with some of this – it is used by ‘an elite’ because this is who we make it available to – comes back to open data again’ . I agree it is important to fund universities – and would agree ‘benefit’ is difficult to measure

Now open to the floor for comments and questions:

Q: Question around issues of ‘turning loose data’ – concerns – perhaps several overlapping concerns

A: It is scary, but need to do

TS: Need a fielded ‘forkable’ wiki as catalog – not wikipedia model where there is ‘one right answer’

Comment: What are the five most important things we ought to be teaching LS students in knowledge organisation/cataloguing right now? Answers online please!

Q: Libraries are not something ‘isolated’ – how do we fit into an integrated world?

RT: Very much agree need to break down barriers between different information silos – archives, libraries, museums

Q: The unique/unusual stuff isn’t going to be tagged on librarything

DH: We need to understand that it is not just one cataloguers responsibility to provide metadata on a resource. There is a community around every object – and you have to harness that

TS: There is a lower limit – communities need to be a particular size to be useful

Q: How much of the success of librarything is on planning and how much ‘on the fly’. Also what about the economics – how do you get paid?

TS: Just throw stuff up and see how it flies

TS: On economics – do good ideas, then work out how you get paid – if it is good enough, money will flow towards it

Missed a load of discussion there, because I got up to ask a question, however worth noting that a lady from ? the National Library of Singapore ? talked about how by creating ‘microsites’ of some of their documents they increased hits from 400 a month (when the docs were in a ‘digital library system’) to 150,000 a month (and rising at 10% a month). This just hammers home the point that we need to put our data ‘on the web’ in a web native way – microsites may not be the only way – but (for example) if our online systems supported simple URLs to a record (like say Flickr does) then we would have this working – but because they all use (or have traditionally) session IDs in their URLs this just does not happen.

Q: Why does tagging in LibraryThing work but not in other environments?

TS: Whether user contribution is useful or not is highly situational. Don’t believe that tagging will be successful in a library catalog – the user just isn’t in that ‘frame of mind’ – when they are using the catalog, they may not even have read it. If we want to use tagging data in catalogs, libraries will need to bring it in from other sources.

 

Technorati Tags:

ALA 2008: A Has-been cataloger looks at what cataloging will be – Diane Hillmann

Diane Hillmann is Director of Metadata Initiatives and the Information Institute of Syracuse (formerly of Cornell)

There are several converging trends:

  • More catalogers work at a support staff level than as professional librarians
  • More cataloging records are selected by machines
  • More catalog records are being captured from publisher data or other sources
  • More updating of catalog records is done via batch processes
  • Libraries continue to de-emphasize processing of secondary research products (books and serials) in favour of unique, primary materials

Options:

  • Extinction
  • Retool

Extinction:

  • Keep cranking about how nobody appreciates us
  • Asert over and over that we’re already doing everything right – why should we change?
  • Adopt a ‘chicke little’ approach to envisioning the future “the sky is falling”

Retool

  • Consider what cataloger do, and what they will do, and map training
  • Look for support for retraining at many levels
  • Find a new job title – catalogers do a lot of other things

What do ‘metadata librarians’ do (as opposed catalogers – the retooled cataloger):

  • Think about descriptive data without pre-conceptions around descriptive level, granularity or descriptive vocabs
  • Consider the entirety of the discovery and access issues around a set or collection of materials
  • Consider users and uses beyond an individual service when making data design decisions

The metadata librarian is

  • aware of changing user needs
  • understands the evolving information environment
  • works collaboratively with technical staff
  • familiar with all metadata formats and encoding standards

The metadata librarian skill set is:

  • Views data as collections, sets or streams
    • Familiar with a variety of metadata formats (DC, VRA Core, MODS etc.)
    • Understands basics of data encoding (XML, RDF etc.) but is generally not a provrammer
    • Understands the various ways that data can be created (by humans or machines) and manipulated (crosswalked etc.)

Characterisitics of the New World:

  • No more Integrated Library Systems
  • Bibliographic utilities are unlikely to be the ‘central node’ for all data
  • Creation of metadata will become far more decentralized – not all library data
  • Nobody knows how this will all shake out
  • But: Metadata Librarians will be critical in forging solutions

Disintegrated Library Systems:

  • All metadata will not be managed in and delivered from one central store
    • Discovery is the first function that is being disaggregated from the ILS – there will be others
    • Metadata may be managed in a variety of databases, structures and systems

Role of bibliographic utilities:

  • Optimized to be the middleman of the traditional data sharing system
  • Currently limited to handling MARC data – not sure whether or when that will change (RDA will be firths challenge here)
  • New services are contemplated

(as an aside OCLC getting a hard time here today – feel a bit sorry for Roy!)

New models of creation and distribution

  • All data will not be created by librarians
    • some will originate from machine processes
  • We need to exchange data based on a more open model – on the web
  • Broader use of OAI-PMH is a good start towards opening data beyond applications and bespoke portals
  • Need to avoid commoditizing DATA instead base business model on building necessary SERVICES

Not sure about OAI-PMH – why not just published the stuff on a webpage with semantic markup to give structure?

Open data:

  • Nobody knows how rich our data is unless we make it fully available – we can’t compete as data providers unless we do this

 

Technorati Tags:

ALA 2008: Catalogs and the Network level – Roy Tennant

Roy using a quote/concept that I’m going to use in my presentation (grrr):

  • Then: Users built workflow around libraries
  • Now: Library must build services around user workflow

Discovery happens elsewhere…

Roy mentioning some prominent web services:

  • Google
  • Amazon
  • Digg
  • etc.

Noting that:

  • Scale matters
  • Spread matters

a.k.a Concentation and Diffusion

Roy looking back to the time when cataloguers created bib metadata on cards, which could be distributed around libraries.

Roy telling an anecdote how he decided to put ‘Rolling Stones’ under ‘Rock Music’ rather than ‘Music, Pop’ – but that when he did this, only his local library benefited.

We now have the ability to share records – but still we create local records, so changes we make still only deliver local benefit.

If we pushed data back into a global system (Worldcat of course in this context, but the point stands), then we can share that benefit.

The benefits of ‘concentation’

  • Search results ranking
    • Holding data – the number of libraries holding a book says something about ‘importance’ (I think this is true, but Roy’s example of ‘the more libraries, the more important’ I’m not convinced about – there is an issue with ‘ant trails’ here – that is if we all ‘follow the leader’ or the strongest path, there is a risk we don’t explore other potentially useful/better avenue)
    • Usage data
  • Recommendations of related books (a la LibraryThing)
  • User contributed content

WorldCat Identities is an example of the data mining possibilities of a large aggregation

Steve Museum a good example of user contributed data enhancing a collection

‘Switching service’ – allows you to move from one piece of information to another – Roy uses an example of moving from a blog post (via http link) to a book in Worldcat, to a local library record. Noting the ‘last link’ is missing – no option on his local library homepage to ‘send this to me’. If libraries did this – they would ‘so rock’ 😉

Benefits of ‘diffusion’

  • Library holding syndicated into places where people are found (e.g. Google)
  • Small libraries can play in big spaces
  • The more paths to your content the better

Examples of integration of links to library resources – in web pages, in wikipedia etc.

  • Concentation
    • Webscale presence
    • Mobilize data
  • Diffusion
    • Disclosure of links, data and services
Technorati Tags: