eScience, Scholarly Communication and the Transformation of Research Libraries

This talk by Tony Hey – Corporate VP for External Research, Microsoft Research.

So, Tony is saying that we are seeing an ’emergence of a new Data-Centric paradigm for research’, and that Web 2.0 students won’t use the library in the traditional way – so there is a need to redefine the role of the research library.

We have seen (and continue to see) and explosion in the amount of data being produced in scientific research – huge amounts of data being produced by instruments, simulations, sensor networks – we are able to ‘measure’ stuff to an overwhelming degree. Tony sees management and ‘curation’ of this data as a huge challenge for the research community – he says the scale of the challenge is one of the reasons he joined MS.

The ‘Scientific Data Deluge’ – data collection, data processing, digital preservation.

An example – ‘Fighting HIV with Computer Science’:
Research from ‘Spam Blocking’ machine learning project, which then moved to use of machine learning in tools that scientists can use. The original project was aimed to analyse huge amounts of data as to whether it was spam or not – led to drawing out correlations in huge data sets on HIV.

Cyberinfrastructure – this is the real problem, the ‘calculation’ bit is easy, it is the infrastructure needed (both technical and organisational) that is the problem. Tony references the NSF report on this (http://www.nsf.gov/pubs/2007/nsf0728/index.jsp).

Tony makes the point that it isn’t just about e-Science, but e-Research – the same issue applies to arts and humanities.

Tony says research today is:

  • Data intensive
  • Compute intensive
  • Collaborative
  • Multi-disciplinary

Today – web users are using tools that could really help here, but typically Researchers are using custom standalone tools, the ‘sharing’ process is still via long publication process, physical meetings etc.

In eResearch data is easily accessible, shareable, (eg. http://cas.sdss.org/dr5/en), services expose functionality (e.g. BLAST from the NLM, http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome), services are in the cloud rather than installed locally (e.g. Amazon Web Services – S3, EC2 – this also used for home storage  solutions – JungleDisk).

Researchers can be seen as ‘extreme information workers’ – looking for subtle signs in the information available.

Publications as live documents – starting to see examples of figures in electronic publications that are based on ‘live’ data – so the reader can change aspects of a graph, plot different scales, overlay other data etc.

Just discovered that quite a few of the slides that Tony is using are available at http://research.microsoft.com/workshops/CEfS2007/presentations/TonyHey.pdf (although this is from a different talk, many of the slides seem to be the same).

Microsoft are building a Virtual Research Environment (VRE) with the British Library – looks like a web portal with stuff like RSS feeds, funding opportunity alerts, saved searchers, integration with MS tools (e.g. OneNote) for bibliography, Word and Excel 2007 – could add external tools to the ‘ribbon’ – e.g. library research tools)

Tony is going through his slides quite quickly so hard to capture. Now onto Scholarly publishing – the rules are changing – comparing to the Music Industry and music downloads – scholarly publishing industry (publishers and libraries/universities/academics) need to adjust.

Funding bodies now starting to make deposit of research results (publications, data and primary materials) mandatory as part of funding agreement (e.g. ERC)

Referencing article by Paul Ginsparg ‘As we may read‘ published in the Journal of Neuroscience, Sept 20, 2006. Ginsparg was the driving force behind ArXiV – he sees this model being adopted across all research areas. Also, sees a role for libraries and societies – perhaps reclaiming roles they fulfilled in the 19th century. Tony suggests that libraries are not necessarily fulfilling this function – I would argue that universities are not clear they want this…

If you look at ranking of universities on Google Scholar – University of Southampton is the top ranking UK University in this measure – which isn’t a ‘quality’ judge, but think about how available this information is – this means that papers from UoS get more visibility, more citations, more influence.

All the tools to support this need to be completely straightforward for the researcher – no extra effort.

The EU PLANETS Project – Digital Preservation – use of XML – specifically the Office OpenXML – now an ECMA Standard – but also open source ODF to OOXML converter – ODF is the ‘Open Document Format

Tony Hey leaves us with a challenge – once eResearch is ‘in the Cloud’  where is the Research Library?

Question: Will commercial publishers be destroyed by OA?
Answer: No – MS working with publishers. Tony thinks the ‘big’ ones will be fine – Science, Nature etc. But smaller publications may be more challenged – however Tony is keen to work with smaller publications to see how this can work – he doesn’t want them to go out of business but he believes the business model has to change.

Question: Where does payment come in?
Answer: Tony seems not particularly in favour of Author pays – sees problems with the model

Question: Who curates data in ‘mashups’
Answer: It’s a problem – if data coming from different sources, are they all conforming to the same curation standards – seems unlikely – perhaps this is where more commercial opportunity here.

Question (from me): Do researchers want to share their data – data is valuable?
Answer: Tony’s personal opinion is that they should have to share their data, but perhaps after a certain amount of time – keen to stress this is his personal view.

2 thoughts on “eScience, Scholarly Communication and the Transformation of Research Libraries

  1. Why OOXML, ODF is an ISO standard and not controlled by a propriatry company which is forcing a lock-in and maing a mockery of the standars process?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

eScience, Scholarly Communication and the Transformation of Research Libraries

This talk by Tony Hey – Corporate VP for External Research, Microsoft Research.

So, Tony is saying that we are seeing an ’emergence of a new Data-Centric paradigm for research’, and that Web 2.0 students won’t use the library in the traditional way – so there is a need to redefine the role of the research library.

We have seen (and continue to see) and explosion in the amount of data being produced in scientific research – huge amounts of data being produced by instruments, simulations, sensor networks – we are able to ‘measure’ stuff to an overwhelming degree. Tony sees management and ‘curation’ of this data as a huge challenge for the research community – he says the scale of the challenge is one of the reasons he joined MS.

The ‘Scientific Data Deluge’ – data collection, data processing, digital preservation.

An example – ‘Fighting HIV with Computer Science’:
Research from ‘Spam Blocking’ machine learning project, which then moved to use of machine learning in tools that scientists can use. The original project was aimed to analyse huge amounts of data as to whether it was spam or not – led to drawing out correlations in huge data sets on HIV.

Cyberinfrastructure – this is the real problem, the ‘calculation’ bit is easy, it is the infrastructure needed (both technical and organisational) that is the problem. Tony references the NSF report on this (http://www.nsf.gov/pubs/2007/nsf0728/index.jsp).

Tony makes the point that it isn’t just about e-Science, but e-Research – the same issue applies to arts and humanities.

Tony says research today is:

  • Data intensive
  • Compute intensive
  • Collaborative
  • Multi-disciplinary

Today – web users are using tools that could really help here, but typically Researchers are using custom standalone tools, the ‘sharing’ process is still via long publication process, physical meetings etc.

In eResearch data is easily accessible, shareable, (eg. http://cas.sdss.org/dr5/en), services expose functionality (e.g. BLAST from the NLM, http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&PAGE_TYPE=BlastHome), services are in the cloud rather than installed locally (e.g. Amazon Web Services – S3, EC2 – this also used for home storage  solutions – JungleDisk).

Researchers can be seen as ‘extreme information workers’ – looking for subtle signs in the information available.

Publications as live documents – starting to see examples of figures in electronic publications that are based on ‘live’ data – so the reader can change aspects of a graph, plot different scales, overlay other data etc.

Just discovered that quite a few of the slides that Tony is using are available at http://research.microsoft.com/workshops/CEfS2007/presentations/TonyHey.pdf (although this is from a different talk, many of the slides seem to be the same).

Microsoft are building a Virtual Research Environment (VRE) with the British Library – looks like a web portal with stuff like RSS feeds, funding opportunity alerts, saved searchers, integration with MS tools (e.g. OneNote) for bibliography, Word and Excel 2007 – could add external tools to the ‘ribbon’ – e.g. library research tools)

Tony is going through his slides quite quickly so hard to capture. Now onto Scholarly publishing – the rules are changing – comparing to the Music Industry and music downloads – scholarly publishing industry (publishers and libraries/universities/academics) need to adjust.

Funding bodies now starting to make deposit of research results (publications, data and primary materials) mandatory as part of funding agreement (e.g. ERC)

Referencing article by Paul Ginsparg ‘As we may read‘ published in the Journal of Neuroscience, Sept 20, 2006. Ginsparg was the driving force behind ArXiV – he sees this model being adopted across all research areas. Also, sees a role for libraries and societies – perhaps reclaiming roles they fulfilled in the 19th century. Tony suggests that libraries are not necessarily fulfilling this function – I would argue that universities are not clear they want this…

If you look at ranking of universities on Google Scholar – University of Southampton is the top ranking UK University in this measure – which isn’t a ‘quality’ judge, but think about how available this information is – this means that papers from UoS get more visibility, more citations, more influence.

All the tools to support this need to be completely straightforward for the researcher – no extra effort.

The EU PLANETS Project – Digital Preservation – use of XML – specifically the Office OpenXML – now an ECMA Standard – but also open source ODF to OOXML converter – ODF is the ‘Open Document Format

Tony Hey leaves us with a challenge – once eResearch is ‘in the Cloud’  where is the Research Library?

Question: Will commercial publishers be destroyed by OA?
Answer: No – MS working with publishers. Tony thinks the ‘big’ ones will be fine – Science, Nature etc. But smaller publications may be more challenged – however Tony is keen to work with smaller publications to see how this can work – he doesn’t want them to go out of business but he believes the business model has to change.

Question: Where does payment come in?
Answer: Tony seems not particularly in favour of Author pays – sees problems with the model

Question: Who curates data in ‘mashups’
Answer: It’s a problem – if data coming from different sources, are they all conforming to the same curation standards – seems unlikely – perhaps this is where more commercial opportunity here.

Question (from me): Do researchers want to share their data – data is valuable?
Answer: Tony’s personal opinion is that they should have to share their data, but perhaps after a certain amount of time – keen to stress this is his personal view.

2 thoughts on “eScience, Scholarly Communication and the Transformation of Research Libraries

  1. Why OOXML, ODF is an ISO standard and not controlled by a propriatry company which is forcing a lock-in and maing a mockery of the standars process?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.