LiFE^2 Case Studies – British Library Newspapers

This afternoon we are starting with three case studies. The first (presented by Richard Davies from the BL) is for material that is not ‘born digital’ – in this case Newspapers at the British Library.

The BL wanted to use the “Burney Collection” – 1,100 volumes of the earliest known newspapers, with about 1 million pages, digitised from the microfilm.

They originally wanted to compare the digital collection with the analogue collection. However, because of access restrictions to the printed Burney Collection, they decided it wasn’t a particularly good comparison. So instead they compared a snapshot of their analogue legal deposit newspaper collection with the digital Burney Collection.

The point was not to say which was more expensive or better value, but to see if the LiFE model was workable for both types of collection.

The BL tried to cost each part of the process that they went through, including staff costs. They used linked spreadsheets to allow manipulation of the underlying cost assumptions without having to change all the formulae – so they could change (e.g.) salary costs and this would filter through.

The BL found that the overall LiFE model worked very well for the analogue collection. Even though not all terminology applied (for example the model talks about ‘bitstream’ – which is a digital only idea), the concepts underlying the terminology would apply to both – so perhaps the terminology needs refining to show this.

However, they found that at the element and subelement level, the detail for the digital was different to that required for analogue – which you might reasonably expect.

There were some significant differences:

There were large ‘creation’ costs for the digital collection, but not for the analogue collection (although it occurs to me, that actually there are large costs for the creation of the analogue collection – just not borne by the BL – does the model need to take into account commercial input?)

The overall conclusions were:

  • Comparison (of the analogue to digital) is complex but workable
  • Retrospective costing adds complications
  • Similar costs across a number of LiFE Stages
  • Analogue lifecycles are well established compared to digital

LiFE^2 Case Studies – British Library Newspapers

This afternoon we are starting with three case studies. The first (presented by Richard Davies from the BL) is for material that is not ‘born digital’ – in this case Newspapers at the British Library.

The BL wanted to use the “Burney Collection” – 1,100 volumes of the earliest known newspapers, with about 1 million pages, digitised from the microfilm.

They originally wanted to compare the digital collection with the analogue collection. However, because of access restrictions to the printed Burney Collection, they decided it wasn’t a particularly good comparison. So instead they compared a snapshot of their analogue legal deposit newspaper collection with the digital Burney Collection.

The point was not to say which was more expensive or better value, but to see if the LiFE model was workable for both types of collection.

The BL tried to cost each part of the process that they went through, including staff costs. They used linked spreadsheets to allow manipulation of the underlying cost assumptions without having to change all the formulae – so they could change (e.g.) salary costs and this would filter through.

The BL found that the overall LiFE model worked very well for the analogue collection. Even though not all terminology applied (for example the model talks about ‘bitstream’ – which is a digital only idea), the concepts underlying the terminology would apply to both – so perhaps the terminology needs refining to show this.

However, they found that at the element and subelement level, the detail for the digital was different to that required for analogue – which you might reasonably expect.

There were some significant differences:

There were large ‘creation’ costs for the digital collection, but not for the analogue collection (although it occurs to me, that actually there are large costs for the creation of the analogue collection – just not borne by the BL – does the model need to take into account commercial input?)

The overall conclusions were:

  • Comparison (of the analogue to digital) is complex but workable
  • Retrospective costing adds complications
  • Similar costs across a number of LiFE Stages
  • Analogue lifecycles are well established compared to digital

LiFE^2 – Implementation of the LiFE work

This session is describing some practical implementations of the LiFE costing model (we have more detailed case studies coming this afternoon).

The first is the from Denmark (Anders Bo Nielsen and Ulla Bogvad Kejser):

The aim was to estimate and compare lifecycle costs of preservation of digital material held by Danish culturual heritage institutions, covering the National Archives, the Royal Library and the State and University Library.

The Danish project chose LiFE model as it was already developed, and seem to have reasonable traction in the sector, and had been tested on real data sets – albeit small data sets. However, they have some improvements they would like to see to the model, including:

  • Use of OAIS terminology to ease understanding etc.
  • Breakdown in more generic function entities to avoid bias towards library material (since they are interested in other cultural heritage areas like Museums etc.)
  • Needs to cover all costs – e.g. general admin, facilities, cost of systems to manage lifecycle etc.

They also removed the ‘metadata’ stage, and spread the metadata elements across the other stages (this was referred to in Paul Wheatley’s talk, in terms of disagreement over the best way to handle the metadata aspects of the model) – this latter approach makes more sense to me, rather than regarding ‘metadata’ as a specific activity, making it a function of other parts of the model. In fact, the more I think about it, the more it strikes me that regarding ‘metadata’ as an activity in itself is a serious problem, and suggestive of a ‘cataloguing’ centric view of the world – we should always see the use of metadata as a means to an end, not and end in itself.

Now Ulla Kejser now talking about a specific instance, preserving pictures from celluloid in digital format. They used the LiFE model to estimate the costs of digital preservation vs film preservation, and found that the ongoing costs for film preservation are much lower than digital preservation, however over 5 years the digital preservation turns out to be cheaper, so they have decided to use TIFF digital copies as their ‘safety’ copy. She also noted that they were dealing with very high resolution images, which increased the cost of digital preservation.

Finally in this session (and the last before lunch) is Paul Ayris (Director of Library Services, UCL), is speaking about the JISC-LC Blue Ribbon Task Force on the economic sustainability of digital preservation (of which both he and Paul Courant are members).

Paul is going to cover:

  • Why is digital preservation important
  • Implications – focusing on UK Exemplars
  • The work of the Blue Ribbon Task Force

UCL has a 5 year library strategy going up to 2010, with 10 over-arching goals, with e-strategy a priority in many of them (Teaching and Learning, Research, Student Experience, Partnership working)

UCL has a model of the user experience – focus on ‘value’ and user demand rather than on the cost of providing the service. They have a defined a ‘generic’ user called ‘Charlie’ (the phrase ‘charlie says’ spring irresistibly to mind).

They have a number of scenarios for Charlie (although I feel that they have missed the point of the idea of having a ‘user’ scenario here, as essentially they say Charlie might be a student, or a researcher or something else etc. – surely there should be different exemplars for each type?)

Anyway, this is a hook on which to hang an analysis of what users want from the library, and what other resources they use. UCL is aiming to bring together a number of different things through the ‘UCL Portal’ (the dreaded words ‘one-stop shop’ have been uttered – feel like I’ve stepped back in time by 5 years – does anyone believe in the one-stop shop anymore?). Oddly Paul goes on to describe how the library is only one content provider in a networked environment – this seems a recognition that the one-stop shop is not possible?

Interestingly UCL assume that STM researchers do not come to the physical library (unless they absolutely have to) – from an Imperial point of view, this is ALL (well almost) our researchers!

Anyway, in this new information landscape, long-term digital preservation of assets is essential. Paul says it is irresponsible to steer users towards these digital resources and and to not think about their longterm viability.

Paul now going to talk about two aspects of digital preservation close to my interest – ‘Big Science’ and ‘Small Science’.

Firstly ‘Big Science’. Looking at the UK Research Data Service (UKRDS) project – RLUG and RUGIT have issues an invitation to tender, with £200k from HEFCE for a feasibility study into the development of a shared digital research data service for UK HE.

There are other options to the UKRDS:

National services which work for the academic community – e.g.

  • E-Depot in The Hague is a national Dutch exemplar.
  • Commercial services such as Portico
  • Local digital curation services – based at the institution (and institutional repositories are perhaps examples of this – but so far have concentrated on published output rather than primary datasets)

What is the ‘Blue Ribbon Task Force’?

Is has been setup by the NSF in the US, with funding from the Mellon Foundation, and partners include the Library of Congress and JISC.

The key questions being addressed are:

  • How will we ensure the long-term preservation and access to our digital information?
  • How will we successfully migrate data from one preservation format to another?
  • Should we preserve everything, or be selective?
  • If we are selective, what criteria do we use?

Also considering economic sustainability:

What is the cost to preserve valuable data and who will pay?

Economically sustainable digital preservation will require:

  • new models for channeling resources to preservation activities
  • efficient organization that will make these efforts affordable
  • recognition by key decision makers of the need to preserve with appropriate incentives to spur action

The Blue Ribbon Task Force is not just about HE – looking at wider environment.

The task force says that we need a recognition of the benefits of preservation – and this needs to happen at the level of key decision makers. I wonder if we have ever taken this approach to preservation before? It comes back to something that Paul Courant said – if we cost in preservation before doing anything, the startup costs will be too high. This seems to be the crux of the issue for me – which approach we take here is key.

LiFE2 – LiFE Model Economic Validation

This talk from Bo-Christer Bjork – Professor at the Swedish School of Economics and Business Administration.

Bo-Christer was asked to validate the econdomic modelling and methodology of the models developed in LiFE.

Bo-Christer is introducing the idea of life cycle costing – which is theoretically attractive, but not applied much in practice – probably because it takes a long view in terms of timescale, which many investors/owners of capital goods are not so interested in, having much shorter term horizons.

However National Libraries and Universities have longer term time horizons, so lifecycle costing method is more attractive to them.

Bo-Christer now talking about Facility management as an example where lifecycle costing can give valuable information – because buildings etc. are owned and operated for decades. A comment that he guesses the cost of the BL building was higher than estimated, but that over the lifecycle you can see this is worth it (some sounds of wry amusement from the audience at this!)

Now covering ‘Total Cost of Ownership’ – lifecycle costing as applied to IT hardware and software.

Bo-Christer applied IDEFO modelling to validate the LiFE model – a graphical process modelling tool originally developed for the US navy. Models processes with inputs and outputs.

Now some diagrams – unfortunately unreadable from where I’m sitting – but demonstrating the graphical model for inputs, outputs and processes associated with digital object management in libraries.

Bo-Christer was specifically asked to look at how the model should handle inflation. It is standard practice in lifecycle costings to do costings in real monetary terms, which is OK for future costs, but historic costs should be adjusted to take into account inflation. However, in the case of extremely long periods other methods should be used. In terms of LiFE, when they lookd at the Newspapers cast study (something that will be covered later today), then this was an issue.

Bo-Christer now covering the idea of ‘discounting’ – a technique used where costs and incomes occur in different years. For example, with a discount rate of 5% £100 cost or income in 10 years time is worht £32 today.

Although discounting applies well to large investments (e.g. building a factory), it isn’t well suited if there is a steady stream of costs over years, and there is no income to compare it with, so Bo-Christer recommended that it shouldn’t be used for LiFE.

Overall, I’m not sure I’m much the wiser at the end of this talk – I’m sure Bo-Christer knows what he is talking about, and I think it is great that they have been working at validating the economics.

A question from Chris Rusbridger – how does the lifecycle model apply to an open-ended ‘lifecycle’ – Bo-Christer acknowledges that it is an important issue, but not sure what the answer is.

A question/comment from Paul Courant suggesting not using discounting is a problem, because even very small costs become large (or even ‘infinite’) if you have an open ended lifecycle (i.e. if you commit to preserving something forever)

LiFE Model

This talk by Paul Wheatley, the Digital Preservation Manager at the British Library.

Paul starting by describing the LiFE model, and the shortcomings of the LiFE Model v1.0. Some of these were addressed in v1.1, and v2.0 of the model is due out in August 2008.

Version 1.1 of the model makes some changes – especially differentiating between bitstream preservation and content preservation, and also separating out creation/acquisition costs slightly, as they don’t always apply.

For Version 2.0, they are looking at bringing in elements for ‘Disposal’. How Metadata is handled has divided the LiFE time, and there are some changes in v2.0.

Quite a lot of detail being covered by this report, but unfortunately it isn’t terribly gripping – I would guess reading the reports out of the LiFE projects would cover all this.

At the end some questions about the model. One interesting point about rising cost of electricity.

LiFE^2 – Some Economics of Digital Preservation

The keynote by Paul Courant.

Since libraries are concerned with ‘the past’ (with an eye on the future), and the past grows in scope literally by the second, we’ve got a real challenge on our hands.

Paul starting by asking ‘What is Preservation?’ – saying that he will leave talk of digital until the end of his talk, as he believes that if we understand preservation, we generally understand digital preservation (with some caveats).

You have to have ‘something’ to preserve – information or artifacts or both – an “object”. Preservation activity affects the flow of current and future services available from the “object”. The potential usefulness of the object in the future is dependent on the preservation activity that we have undertaken.

Lifecycle cost according to LiFE said that the cost over time equated to the cost of acquisitions plus time dependent costs associated with: Ingest, Metadata, Access, Storage and Preservation.

Paul saying that the benefits are:

  • Findability (we need to be able to find it)
  • Usefulness (we need to be able to use it)
  • Reliability (we need to do both of the above reliably)

Paul says: Finding a needle in a haystack is relatively straightforward if you know it is there – much better than trying to find a needle in any haystack when you aren’t even sure if the needle is there in the first place.

Paul now quoting from an economist Robert Solow:

“The duty imposed by sustainability is to bequeath to posterity not any particular thing – with rare exceptions such as Yosemite, for example – but rather to endow them with whatever it takes to achieve a standard of living at least as good as our own and to look after the next generation similarly”

This draws an interesting distinction between the general level of preservation – that we just need a ‘body’ of resource that is sustained – and the need to preserve specific things because of their particular impact. I think this is a good concept – and that the thing that is difficult is to define the specific things that are the ‘rare exceptions’ – because most stuff isn’t important in itself, but as it represents a body of resource.

Paul now arguing that ‘markets’ in general won’t do preservation. Quote from Anand and Sen, 2000:

“sustainability cannot be left entirely to the market. The future is not adequately represented by the market – at least not the distant future”

Paul relating the problem of trying to study iPod adverts – the ‘market’ isn’t interested in preserving these.

Paul saying that the cost of adding extra ‘users’ to resources approaches zero (perhaps especially in the context of digital information). I’m not entirely convinced by this addendum, although clearly the cost is low, dealing with a million regular users is a different level of resource to dealing with 1000 regular users.

Paul arguing that there are a number of values related to Natural Resources:

  • Public Good
  • Use Value (you can do something with the resource)
  • Existence Value (knowing something is there is important in a general sense, even if you don’t use it)
  • Option Value (it is important to have the option to use a resource)

Paul now dividing two types of sustainability:

  • Specific sustainability – preserving a specific object (e.g. Magna Carta original manuscript)
  • Value sustainability – preserving the value encoded in an object (e.g. the text of the Magna Carta)

Paul now showing some points from the NSF BRP on Economically Sustainable Digital Preservation and Access:

  • Recognition of benefits of preservation by people who can move resources (Demand)
  • Incentives to people who have the stuff
  • Mechanisms to move resources to the stuff as routine or default, including handoffs
  • Efficient use (don’t save everything perfectly, make choices)
  • Organization and governance of the many relevant players (Paul saying that for this, UK is relatively well positioned, having clear national government, a national library and JISC funding national work – compared to the US)

Paul saying you can’t expect library materials to come with full costs of preservation – we would never have bought any books if we had started like this.

Now Paul saying, all the above is true about preservation in general, so what is different about digital?

  • Fragile – in a different way to paper based stuff
  • Too much staff
  • Rights Environment
  • Use doesn’t wear it out (and may even make it more usable in the future)
  • Functionality and Links (very fragile)
  • Public Goods Implications – once something is available digitally on a server, there are very low distribution costs – this changes the business model – having unique aspects to a physical collection concentrates people around the resource – not true with digital collections

Some points about Digital Scholarship:

  • Easy (sort of) cases
    • Digitized print (Google and the SDR)
    • Journals (Portico, LOCKSS, Some National Libraries)
    • Astronomical Data (because the astronomy community wants to and likes to share data, not because the data is particularly easy)
  • Harder cases
    • Multimedia projects
    • Things with links and embedded functionality (from excel spreadsheets on up)
    • Data from Chemistry experiments (chemists are the opposite of astronomers!)
  • Hardest
    • The cultural record itself
    • Business records, etc.

Paul finishing by saying that only collecting what you know you can sustainably (indefinitely) keep is a “Really Bad Idea”.

Q: Michigan one of the early adopters in regards Google digitization – what economic factors did you look at?

A: Did some calculations about holding 7 million books on servers. University committed to finding the money when the time came. University stood by this committment – and academic value was clear. They did not make an argument about savings to be made by digitization

Q: Can you comment on how preserving websites differs to what you have outlined in your talk?

A: Need a strategy to do a small sample to very high quality, and then do a very large sample at low quality, and recognise that you cannot preserve everything (and we have never done this, or strived to do it). “It is as much museum like as library like – but a lot of things are becoming more museum like, than library like”

Q: One of the things you said is different about digital is loss of local control – can you comment on the impact on the economics and business models?

A: The economics and business models change. The BL exists not just for love, but for profit – it is a differential asset for the UK. Once you look at digital, this is harder – will require high level agreements between governments, Universities etc. That the payoff for having a great local collection might no longer exist is a problem – but what if you can say you have a high level of local skill (in the library) to exploit and integrate digital and physical resources you might get local investment there – but who will pay for making the material available? Not clear.

LiFE^2

http://www.life.ac.uk

http://www.life.ac.uk/blog

Today I’m at the LiFE2 conference at the British Library. LiFE2 is a follow up to the original LiFE project, which looked at the lifecycle of digitial resources, and apply the findings to three real collections.

LiFE2 has looked at validating the economics of the LiFE Model, and we’ll be have a presentation on this, followed by several case studies.

The introduction is by Helen Shenton (Head of Collection Care at the British Library) – she is both covering the background of the LiFE and LiFE2 projects, and stressing that we live in a hybrid world – perhaps over egging this a bit for me – anyway, we know that we have a huge print legacy as well as needing to engage with the digital world – and Helen is stressing the credentials of the BL in both areas.

Helen now introducing the keynote speaker – Paul Courant, Dean of Libraries at the University of Michigan.