The keynote this morning from Kevin Anderson (@kevglobal) and Suw Charman-Anderson (@suw) – journalists and technologists (http://charman-anderson.com/).
Kevin kicks off: Journalists and librarians dealing with many of the same issues – helping people navigate, interpret and understand information. Going to talk about some of the challenges in this area. First playing Xerox video on ‘information overload’ – http://www.youtube.com/watch?v=CXFEBbPIEOI
Eric Schmidt noted that we are now creating huge amounts of information (5 exabytes every 2 days is the quote, but see disagreement with this figure at http://www.readwriteweb.com/cloud/2011/02/are-we-really-creating-as-much.php)
Amount of time people spend on Facebook massively more than they spend on Newspaper web sites. Evidence that people are having problems moving to conclusions on complex stories – people move to simple narratives instead – Kevins says this equals “car crashes and celebreties”
Social media offers opportunity to re-engage people and help them navigate information.
We are moving from “mass” to “relevance” – e.g. not about how many followers you have on twitter, but about the relevance of what you post. Try to move from information overload (a ‘mass’ problem) and have filtered relevant information (a ‘relevance’ solution)
Social media provides a way of filtering information. But social media has to be ‘social’ – you need people at the heart of this.
Examples of crowdsourcing – Guardian analysis of MP expenses (http://mps-expenses.guardian.co.uk/), Ushahidi crowdsourcing crisis information (http://www.ushahidi.com/).
Kevin also mentions ‘entity extraction’ – uses Calais as an example..
Dewey D. – iphone app to manage ‘reading list’ (not in academic sense) and pulls in stories from the New York Times.
Poligraft – analyses funding of politicial campaigns – you can post URLs (of political stories) to Poligraft – it goes through and identifies politicians and organisations and shows you how politicians get campaign funding etc. Tells you about the major industries funding politicians etc – gives context to political story and help make sense of it.
We (journalists & librarians) have hundreds of years of doing things in a certain way – changing culture is incredibly difficult. If you have more than 5 people in the room, inertia hits …
Now Suw taking the floor… to talk crowdsourcing – breaking large tasks into smaller chunks that individuals can do. Suitable tasks – computational tasks and ‘human’ tasks.
Computational tasks = large datasets of computation that can be split into smaller datasets or computations – e.g. SETI@Home – this is about ‘spare cycles’ from individual’s computers they can contribute to computing power.
Human tasks = tasks that humans find easy but computers find difficult; brain driven; uses participants spare time; individual errors are average away by having the same task completed by many people.
Type of human tasks:
- Recognising and describing things in images
- Reading and transcribing writing
- Applying expertise to identify, sort and catalogue
- Collecting data
- Manipulating models
Examples …
PCF oil paintings tagger – http://tagger.thepcf.org.uk/
- Public catalogue foundation, BBC
- Digitising pictures
- Getting people to tag content with metadata – describe what is in the painting
“You don’t have to be an expert to take part”
Old Weather – http://www.oldweather.org/
Transcribing ships logs – contributes to historic data on climate, as well as other historical background
Ancient Lives – http://ancientlives.org/
Papyrus fragments – transcribe, measure, etc.
Multiple people doing each task gives you confidence when agreement across results
Herbaria@Home – http://herbariaunited.org/atHome/
What’s the Score – http://www.bodleian.ox.ac.uk/bodley/library/specialcollections/projects/whats-the-score
Digitised musical score collection from the Bodleian – will be starting crowdsourcing part of project soon
Why crowdsource?
Provide opportunities for education and knowledge maintenance
Most projects don’t require prior knowledge but people often enjoy learning more about a subject
Improve accessibility through addition of new metadata or improvement of existing metadata – create data for research
Even when digitised, collections are hard to search/comprehend
Galaxy Zoo shows public were as good, or better, than professionals at classifying galaxies
FoldIt found gamers could solve the structure of a protein that causes AIDs in rhesus monkeys in three weeks
Are your projects suitable?
- Can the original material be digitised?
- Can task be broken down into small chunks?
- Can those chunks be done by humans or their computers?
It also helps if…
- There is a benefit for the public – example of Google buying out a image tagging game, which then died
- People feel part of a community
- There are measurable goals and targets
Zooniverse are crowdsourcing gurus..
Citizen Science Alliance – “Science” doesn’t just mean science – looking for projects at the moment…
Events – e.g. Citizen Cyberscience Summit
Q & A:
Failure of crowdsourcing – NASA mapping craters on Mars – mid 80s. But failed to collect data in useful way.
In terms of issues around the data
Wikitorial – not enough community – hurdles to participation not a bad thing
One thought on “Overcoming information overload”