SORT – Advanced text mining tools and resources for knowledge discovery

Penultimate session of the day – Sophia Ananiadou from NaCTeM (National Centre for Text Mining)

What is text mining? – takes us from text to knowledge.

  • Yields precise knowledge nuggest from sea of infomration -> Knowledge Extraction
  • Extraction of ‘named entities’ – e.g. names of people, institution names, diseases, genes, etc. etc.
  • Diovery of concepts allows semantic annotation and enrichment of documents – improves information access (goes beyond index terms) and allows clustering and classification of documents
  • Extracts relationships, events and even opinions, attitudes etc. – for further semantic enrichment

Need a toolkit:

  • Resources – lexica, grammars, ontologies, databases
  • Tools – parsers, taggers, named entity recognisers
  • Annotated corpora
  • Domain adaptation

Sophia talking in a bit more detail about how you go about doing text mining:

  • Start with syntactic analysis
  • Use Named Entity Recognition to extract terms/semantic entities
  • Use parsers to extract other aspects – events, sentiments etc.

All this allows the creation of annotations – semantic metatdata.

Some examples of text mining applications:

Sophia suggests we should be integrating ‘Language Technology’ into open and common e-research infrastructure to enable the use of text mining tools on the content. See U-Compare tool from NaCTeM – http://www.nactem.ac.uk/u-compare.php

Q & A

Q: (David Flanders) If I was a repository manager which tool would you recommend I play with first?

A: All of them! Need to work out what you want to do and pick appropriate tool

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.