Even if we’re right, we’re wrong

I can't quite resist this post to join in the series of comments on the use of Web 2.0 in HE, started by Brian Kelly with his presentation to the JISC-Emerge community: "What if we're wrong?"

Martin Weller responded to Brian's presentation with a blog post "Web 2.0 – even if we're wrong, we're right" in which Martin notes "it will never go back to how it was" – essentially we need to engage even if we are sceptical about the future of specific services or concepts, some of it will survive, and the next generation of technology will be built on this.

Brian has now posted "What if we're right" asking "are the Web 2.0 sceptics assessing the risks hat they may be wrong?"

So, now I'm jumping in. I've just been reading the JISC sponsored report "Great expectations of ICT: How Higher Education institutions are measuring up". This is a follow on from the Student expectations study from last year, and looks at first year university students' experiences of ICT us and provision in HE.

Both of these reports make interesting reading, but in this context I want to draw out some particular points.

The latest report notes:

"Students are somewhat ‘forced’ into becoming familiar with these applications since they are needed to access very basic things such as timetables, as well as lecture notes or PowerPoint slides from lecturers, with some students even taking exams via this portal. Despite them ‘having’ to use these systems students appear to feel comfortable with them, can see the benefits, and feel well supported on the technical front."

Some quotes from a couple of participants in the study:

‘The system WebCT seems a lot more suited
for university work and lectures’

‘I didn't expect to be using computers as much
as we do but I'm glad that things are accessible
on WebCT.’

The report identifies a number of activities that students weren't comfortable with using, and that were generally unfamiliar – these included:

  • Submitting assignments online
  • Using podcasts
  • Making podcasts
  • Making wikis

In both reports "the least popular form of ICT is participation in an online
community such as Second Life".

Some quotes from a number of participants about the idea of using Facebook or other social networking sites for teaching:

‘I only use it for peers and friends. You wouldn’t
want lecturers and tutors to see Facebook’

‘I’d probably get distracted by other stuff on
Facebook and not end up doing anything’

‘I don't know, it would seem kind of weird getting
lecture notes or speaking to your lectures
through Facebook!’

I would be a bit angry to be honest – tuition fees
aren't cheap!

As I read one of the comments on Brian's latest post from Frankie Roberto who says "When I was at University, we barely ever used our Uni e-mail addresses, apart from checking them occassionally to read all-student e-mails.", I thought that although I use a personal email account, I also use my work one, and I don't really want to mix them up – I like the separation that two addresses gives me.

So, can an 'institution' such as a University do Web 2.0 stuff right – even if it is the future? Is this a bit like the government trying to persuade me to txt a mnstr? It's not that I don't use SMS – I just don't want to hold particular conversations in that way.

I'd note that any of this doesn't excuse us from engaging with Web 2.0 – I agree with Martin that whatever comes next will build on what proves persistent in the current generation of technology. I'm just wondering if students will accept services from the institution without the 'official' feel?

Anyway, this is a long way of recommending you read both reports. For those of you of a library bent, the section in the latest report some interesting comments on attitudes towards research and plagiarism – and I'll leave you with this quote from one participant:

‘I usually Google…then go to the library’

Don't we all?

LiFE^2 – Panel Session

Final session of todays conference. Chris Rusbridge from DCC is introducing it, saying quite a lot of what we thought we knew about Digital Preservation is wrong – and implies that quite a lot of what we think we know now is also wrong.

Some discussion about how case studies might inform real costings or estimates in costings in the future? Suggestion that LiFE will look at this in the write up. Desire for a tool to assess.

Always difficult to write up these discussion sessions – not least because they are more interactive from my point of view (i.e. I take part in the discussion).

Some stuff coming up:

  • Need to have better links between value and economic costs – if we can put a figure on ‘value’ we will stand a better chance of getting funding
  • Need tools to help us make decisions regarding digital preservation
  • Why is metadata handled as separate part in the LiFE model?

In closing Paul Ayris summing up:

  • Key to sustainable preservation is demand – which is driven by ‘perception of value’, and we should not be driven by cost of preservation
  • New LiFE model was used in Case Studies described today, and there have been comments from an economist on this suggestion some ways of handling inflation and deprecation
  • If we are looking at developing a generic model, we need to look at the Danish examples, and see how it might apply in different scenarios
  • We are still in the process of learning what ‘digital preservation’ means, and what the costs truly are

Paul’s summarised the following from the panel discussion:

  • LiFE (if it can continue into a new phase) would like to develop a predictive tool to determine costs to help decision making
  • Interest in more case studies
  • Roles and Responsibilities are crucial in digital preservation, and certainly in the UK still need to debate this

Paul says he can’t understand why the UK is so far behind some of the best European examples.

LiFE^2 Case Studies – Q and A

Q: To what extent did the Newspaper case study consider the difference between the very well established workflows/processes with analogue compared to new concepts in digital

A: Definitely something that is focused on in the write-up

Q: What are the ideal and realistic timeframes in which the costings for activities in the LiFE model should be reassessed in an institution (to reassess the overall costs)

A: Neil Beagrie says it is important to revisit very regularly. Neil stressing importance of regular audits of institutional digital data. Stephen Grace suggesting this should be an annual thing to revisit costings.

Q: Where do you draw the line between the ‘creation costs’ and digital preservation costs to be costed by LiFE?

A: No clear answer – but clarification that Royal Holloway costs related to advocacy around acquisitions only included that of staff directly attached to the repository

Q: Note that all the case studies essentially took as a given that they would preserve the material in the format as delivered. Should model be used to predict costs to inform decisions about what to preserve? (Think I got this right – I missed some of the question)

A: A qualified yes basically

Q: Neil mentioned issue of logical format migration. Does anyone have a view on the cost of this?

A: Neil says there is very little in terms of long-term studies of data to give information on this. However, also notes that the more you dig the more you find examples. So far much of the costings around this are based on assumptions of how often we will need to do this, and how much it would cost. In reality there are likely to be large variations between ‘trivial’ transformations – e.g. from one version of s/w to another, and more major ones.

LiFE^2 – Research Data Costs

This session not quite a case study, but is a description of the application of the LiFE model to research data preservation, by Neil Beagrie – which was used to produce the “Keeping Research Data Safe” report recently published.

They found that a number of factors had an impact on the costings from the model:

  • Costs of ‘published research’ repositories vs ‘research data’ repositories
  • Timing – costs c.333 euros for the creation of a batch 1000 records, but 10 years after creation it may cost 10,000 euros to ‘repair’ a batch of 1000 records with badly created metadata
  • Efficiency curve effects – we should see drop in costs as we move from start-up to operational activity
  • Economy of scale effects – 600% increase in acquisitions only give 325% increase in costs

Noting that a key finding is that the cost of Acquisitions and Ingest costs are high compared to archival storage and preservation costs. This seems to be because existing data services have decided to ‘take a hit’ upfront in making sure ingest and preservation issues are dealt with at the start of the process. I think this is a key outcome from the report, but based on the discussion today I don’t know what this tells us. I guess it is a capital vs ongoing cost question. If you’d asked me at the start of the day I’d have said that the model described was a reasonable one. However, after Paul Courant’s talk I wonder if this could result in dangerous inaction – if we can’t afford preservation, we won’t start collecting. The issue is that we can spread ongoing costs over a long period of time, so does dealing with a heavy upfront cost make sense?

Neil making a number of observations, but stressing that he does not regard the study as a the final word on costs.

LiFE^2 – Research Data Costs

This session not quite a case study, but is a description of the application of the LiFE model to research data preservation, by Neil Beagrie – which was used to produce the “Keeping Research Data Safe” report recently published.

They found that a number of factors had an impact on the costings from the model:

  • Costs of ‘published research’ repositories vs ‘research data’ repositories
  • Timing – costs c.333 euros for the creation of a batch 1000 records, but 10 years after creation it may cost 10,000 euros to ‘repair’ a batch of 1000 records with badly created metadata
  • Efficiency curve effects – we should see drop in costs as we move from start-up to operational activity
  • Economy of scale effects – 600% increase in acquisitions only give 325% increase in costs

Noting that a key finding is that the cost of Acquisitions and Ingest costs are high compared to archival storage and preservation costs. This seems to be because existing data services have decided to ‘take a hit’ upfront in making sure ingest and preservation issues are dealt with at the start of the process. I think this is a key outcome from the report, but based on the discussion today I don’t know what this tells us. I guess it is a capital vs ongoing cost question. If you’d asked me at the start of the day I’d have said that the model described was a reasonable one. However, after Paul Courant’s talk I wonder if this could result in dangerous inaction – if we can’t afford preservation, we won’t start collecting. The issue is that we can spread ongoing costs over a long period of time, so does dealing with a heavy upfront cost make sense?

Neil making a number of observations, but stressing that he does not regard the study as a the final word on costs.

LiFE^2 Case Studies – SHERPA-DP

SHERPA-DP – presented by Stephen Grace (Preservation Manager, CeRch)

Within SHERPA-DP (a project to setup a shared preservation environment for the SHERPA project http://www.sherpa.ac.uk).

Stephen running through different aspects of costs. Stephen is one of several presenters to say that Metadata creation isn’t really a separate step – I’m left wondering who actually argued in favour of treating it separately?

They found some aspects hard to predict – e.g. preservation action, where they assumed major action (10 days effort) would be needed every 3 years. This may need to be refined as we learn more about digital preservation.

Costing exercises are difficult – they take time, evidence not readily to hand. However, LiFE offers a consistent methodology. They also felt that it showed the value of 3rd party preservation – ‘tho’ he admits to being biased!

They found that the storage costs had a large impact – so reducing storage costs would have a significant effect.

LiFE^2 Case Studies – SHERPA-DP

SHERPA-DP – presented by Stephen Grace (Preservation Manager, CeRch)

Within SHERPA-DP (a project to setup a shared preservation environment for the SHERPA project http://www.sherpa.ac.uk).

Stephen running through different aspects of costs. Stephen is one of several presenters to say that Metadata creation isn’t really a separate step – I’m left wondering who actually argued in favour of treating it separately?

They found some aspects hard to predict – e.g. preservation action, where they assumed major action (10 days effort) would be needed every 3 years. This may need to be refined as we learn more about digital preservation.

Costing exercises are difficult – they take time, evidence not readily to hand. However, LiFE offers a consistent methodology. They also felt that it showed the value of 3rd party preservation – ‘tho’ he admits to being biased!

They found that the storage costs had a large impact – so reducing storage costs would have a significant effect.

LiFE^2 Case Studies – SHERPA-LEAP

This being presented by Jacqueline Cook from Goldsmiths. Sherpa-LEAP was a project to setup institutional repositories to hold published research output at a number of University of London colleges. The case study covers Royal Holloway, Goldsmiths and UCL.

Because of the relative ‘youth’ of the repositories, the major costs were staffing, and the main processes were Acquisition, Ingest and Metadata Creation.

The costs were calculated based on the amount of time spent on each item. Interestingly there are some institution specific variations – Goldsmiths have high ingest costs because of the variety of material submitted. Royal Holloway have high acquisitions costs because they included the costs of holding outreach events (not cleaer that they costed in the time of the academics attending these – sounds like just the cost of the repository staff to run them)

The overall costs for each institution varied considerably:

  • Goldsmiths
    • Year 1 – 31.48
    • Year 5 – 31.95
    • Year 10 – 32.22

And UCL coming in at approximately half these figures, with Royal Holloway in the middle. Clearly these are estimates. Jacqueline is suggesting that the more complex nature of the objects accepted at Goldsmiths which had a large impact on the variation in costs across the institutions. Along side this there were also:

  • Different use cases
  • Phases in development of repositories
  • What was considered as part or outside the lifecycle
  • Method of deposit
  • Staffing levels

Overall the case study observed that:

  • We are working in a fast-changing environment
  • There are limitation of a simple, per-object average
  • Metadata Quality Assurance might be needed as an element (although noted that Metadata creation is actually part of Ingest, although the model treats it as a separate element)
  • Object-related advocacy – there may need to be an advisory role for repository administration
  • We are at an early stage for preservation planning

LiFE^2 Case Studies – SHERPA-LEAP

This being presented by Jacqueline Cook from Goldsmiths. Sherpa-LEAP was a project to setup institutional repositories to hold published research output at a number of University of London colleges. The case study covers Royal Holloway, Goldsmiths and UCL.

Because of the relative ‘youth’ of the repositories, the major costs were staffing, and the main processes were Acquisition, Ingest and Metadata Creation.

The costs were calculated based on the amount of time spent on each item. Interestingly there are some institution specific variations – Goldsmiths have high ingest costs because of the variety of material submitted. Royal Holloway have high acquisitions costs because they included the costs of holding outreach events (not cleaer that they costed in the time of the academics attending these – sounds like just the cost of the repository staff to run them)

The overall costs for each institution varied considerably:

  • Goldsmiths
    • Year 1 – 31.48
    • Year 5 – 31.95
    • Year 10 – 32.22

And UCL coming in at approximately half these figures, with Royal Holloway in the middle. Clearly these are estimates. Jacqueline is suggesting that the more complex nature of the objects accepted at Goldsmiths which had a large impact on the variation in costs across the institutions. Along side this there were also:

  • Different use cases
  • Phases in development of repositories
  • What was considered as part or outside the lifecycle
  • Method of deposit
  • Staffing levels

Overall the case study observed that:

  • We are working in a fast-changing environment
  • There are limitation of a simple, per-object average
  • Metadata Quality Assurance might be needed as an element (although noted that Metadata creation is actually part of Ingest, although the model treats it as a separate element)
  • Object-related advocacy – there may need to be an advisory role for repository administration
  • We are at an early stage for preservation planning

LiFE^2 Case Studies – British Library Newspapers

This afternoon we are starting with three case studies. The first (presented by Richard Davies from the BL) is for material that is not ‘born digital’ – in this case Newspapers at the British Library.

The BL wanted to use the “Burney Collection” – 1,100 volumes of the earliest known newspapers, with about 1 million pages, digitised from the microfilm.

They originally wanted to compare the digital collection with the analogue collection. However, because of access restrictions to the printed Burney Collection, they decided it wasn’t a particularly good comparison. So instead they compared a snapshot of their analogue legal deposit newspaper collection with the digital Burney Collection.

The point was not to say which was more expensive or better value, but to see if the LiFE model was workable for both types of collection.

The BL tried to cost each part of the process that they went through, including staff costs. They used linked spreadsheets to allow manipulation of the underlying cost assumptions without having to change all the formulae – so they could change (e.g.) salary costs and this would filter through.

The BL found that the overall LiFE model worked very well for the analogue collection. Even though not all terminology applied (for example the model talks about ‘bitstream’ – which is a digital only idea), the concepts underlying the terminology would apply to both – so perhaps the terminology needs refining to show this.

However, they found that at the element and subelement level, the detail for the digital was different to that required for analogue – which you might reasonably expect.

There were some significant differences:

There were large ‘creation’ costs for the digital collection, but not for the analogue collection (although it occurs to me, that actually there are large costs for the creation of the analogue collection – just not borne by the BL – does the model need to take into account commercial input?)

The overall conclusions were:

  • Comparison (of the analogue to digital) is complex but workable
  • Retrospective costing adds complications
  • Similar costs across a number of LiFE Stages
  • Analogue lifecycles are well established compared to digital