Big Data, Small Data & meaning

This blog post was written during a presentation at the British Library Labs Symposium in November 2014. It is likely full of errors and omissions having been written real-time.

Tim Hitchcock (University of Sussex) opening the British Library Labs Symposium

BL Labs a unique project run on a shoestring budget emphasises the role of the library in digital age. “The library creates a space in which the intellectual capital of the library is made available”. Libraries and Museums – not just valued because they are ancient, but because the preserve memories. BL Lab provides a space for experimentation and explores the role of the library in digital research.

Interest in Big Data can obscure the use of Small Data. If we just focus on the very large, and ignore the very small, we miss stuff. A ‘macroscope’ (coined by Piers Anthony!). The idea of a ‘macroscope’ in Digital Humanities come from Katy Borner in her paper on “Plug-and-Play Maxcroscopes”. A visualisation tool that allows data to be viewed both at scale, but also driving down into individual data points in context. Tim see’s this paper as the trigger of interest in the concept in Digital Humanities.

Paper Machines (http://metalab.harvard.edu/2012/07/paper-machines/) – builds on Zotero to allow the user to build their own ‘Google Books’ + ‘Google Earth’. Not problem free, but concept of allowing you to look at the large and small scale. Jo Guldi and David Armitage “The History Manifesto” – they say that ‘micro-history’ is irrelevant. Tim says they only want to look at the largescale – disregarding the other side of the ‘macroscope’.

Scott Weingart – The Historian’s Macroscope : Big Digital History. Blog post “The moral role of DH in a data-driven world”. Weingart advocates ‘network analysis’ and Tim finds him convincing. Weingart makes a powerful case (in Tim’s view) for the use of network analysis and topic mapping as a means through which ‘history can speak to power’ – similar aim to Guldi and Armitage.

Is this an attempt to present humanities as somehow ‘equal’ to STEM? Moving humanities towards social science.

Jerome Dobson “Through the Macroscope: Geography’s View of the World” – using GIS. Again, pulls humanistic enquiry towards social science.

Ben Schmidt ‘prochronisms’ – looking at language use in artistic works purporting to represent historic reality. Looks at the individual words used, then maps against corpuses such as Google Books etc. Althoguh not described as a ‘macroscope’ approach but in Tim’s view perhaps comes the closest to using the full extent of what a macroscope is meant to achieve. Tim highlights how analysis of the scripts for Mad Men shows an arc across the series from (overstated) externalised masculinity of the 1950s, to (overstated) internalised masculinity of the 1970s.

Tim see’s the prochonisms project as interesting because it seems to be one of the few projects in this space that reveals new insights, which many of the big data do not [I feel this is neglecting the point that sometimes the ‘obvious’ needs supporting with evidence]

What are the humanities? Tim see’s the excitement and power of humanities – place detail into a fabric of understanding. Need a close and narrow reading of history to get to the detail of the excluded parts of society – these are not reflected by the massive.

What Tim sees as missing from modern macroscope project is looking at detail, at the individual, at the particular. We risk losing the ability to use fine detail.

Tim recently involved in project to look at trial transcripts from Old Bailey, looking at the language used to represent violence and how that related to trial outcomes and sentences. In STEM ‘cleaning data’ is a chore, they are interested the ‘signal’ that makes its way through the noise. Assumption that ‘big data’ lets you get away with ‘dirty data’.

Humanists read dirty data and are interested in its peculiarities. Tim sees the most urgent need for tools to do close reading of small data. Not a call to ignore the digitial, but a call to remember the tools that allow us to focus on the small.

When we read historical texts, how do we know what the contemporary reader would have known – when words have a new or novel use.

A single message from this talk – we need ‘radical contextualisation’. Every gesture contextualised in knowledge of all gestures ever made. This is not just ‘doing the same thing’. Tim’s favourite fragment of meaning is from linguistics – ‘voice onset timing’ – encompasses the gap between when you open your mouth to speak and the first sound. This changes depending on the type of interaction – by milliseconds. The smallest pause has something to say about the type of interaction that is happening.

Tim would like to see this level of view for the humanities – so we can see in every chisel stroke information about the wood in which it is imprinted, the tool that made the stroke and the person who wielded the tool.

Comments from floor:
* Even in science ‘big data’ not relevant in many cases
* Lingustic scholars have been working on this for years – e.g. Oxford Historical Thesaurus – we need to be wary of re-inventing the wheel
* Textual scholars can learn a huge amount from the kind of close reading that is applied to museum objects
* It is relatively hard to get largescale information about collections
* Online auction catalogues –

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.