Penultimate session of the day – Sophia Ananiadou from NaCTeM (National Centre for Text Mining)
What is text mining? – takes us from text to knowledge.
- Yields precise knowledge nuggest from sea of infomration -> Knowledge Extraction
- Extraction of ‘named entities’ – e.g. names of people, institution names, diseases, genes, etc. etc.
- Diovery of concepts allows semantic annotation and enrichment of documents – improves information access (goes beyond index terms) and allows clustering and classification of documents
- Extracts relationships, events and even opinions, attitudes etc. – for further semantic enrichment
Need a toolkit:
- Resources – lexica, grammars, ontologies, databases
- Tools – parsers, taggers, named entity recognisers
- Annotated corpora
- Domain adaptation
Sophia talking in a bit more detail about how you go about doing text mining:
- Start with syntactic analysis
- Use Named Entity Recognition to extract terms/semantic entities
- Use parsers to extract other aspects – events, sentiments etc.
All this allows the creation of annotations – semantic metatdata.
Some examples of text mining applications:
- Kleio (http://www.nactem.ac.uk/software/kleio/)
- Medie (http://www-tsujii.is.s.u-tokyo.ac.jp/medie/)
- Facta (http://text0.mib.man.ac.uk/software/facta/main.html)
Sophia suggests we should be integrating ‘Language Technology’ into open and common e-research infrastructure to enable the use of text mining tools on the content. See U-Compare tool from NaCTeM – http://www.nactem.ac.uk/u-compare.php
Q & A
Q: (David Flanders) If I was a repository manager which tool would you recommend I play with first?
A: All of them! Need to work out what you want to do and pick appropriate tool