Second session of the second day – Liz Lyon from UKOLN
Open Science at Web-scale report – a consultative document, Liz says now available on writetoreply (but I can’t find it) (thanks to Kevin Ashley, now got a link to this http://writetoreply.org/openscience/)
OK – Liz talking about the amount of data being generated by Genome sequencing machines – now into second generation of Genome sequencing, and the next generation is being worked on which will work at orders of magnitude larger volumes of data.
This type of huge data production brings challenges. Need large-scale data storage that is:
- Cost effective
- Secure
- Robust and resilient
- Low entry barrier
- Has data-handling/transfer/analysis capability
Looking at ‘cloud services’ that could offer this – e.g. Nature Biotechnology 10.1038/nbt0110-13 details use of cloud services in biotechnology.
Starting to see data sets as new instruments for science.
Cost of genome sequencing dropping, while number of sequenced genomes rises.
Leroy Hood says “medicine is going to become an information science”. P4 medicine:
- Predictive
- Personalised
- Preventive
- Participatory
Stephen Friend – chief exec of Sage Bionetworks – wants to develop open data repository (Sage Commons) to start to develop redictive models of disease – liver/breast/colon cancer, diabetes, obesity.
Paraphrasing a quote Liz read out: To Cultural forces encourage sharing – the way people handle personal data will impact on how researchers deal with data and mean they have not choice to share.
Need to think about ways to incentivise researchers to share data – through mechanisms that allow credit and attribution which will then mean researchers benefit from sharing data.
Need to thing about:
- Scaleable data infrastructure
- Personal genomic – share your data?
- Transform 21st Century medicine/bioscience
- Credit and attribution for data and models
Liz’s report is at http://writetoreply.org/openscience/