SORT – Advanced text mining tools and resources for knowledge discovery

Penultimate session of the day – Sophia Ananiadou from NaCTeM (National Centre for Text Mining)

What is text mining? – takes us from text to knowledge.

Yields precise knowledge nuggest from sea of infomration -> Knowledge Extraction
Extraction of ‘named entities’ – e.g. names of people, institution names, diseases, genes, etc. etc.
Diovery of concepts allows semantic annotation and enrichment of documents – improves information access (goes beyond index terms) and allows clustering and classification of documents
Extracts relationships, events and even opinions, attitudes etc. – for further semantic enrichment

Need a toolkit:

Resources – lexica, grammars, ontologies, databases
Tools – parsers, taggers, named entity recognisers
Annotated corpora
Domain adaptation

Sophia talking in a bit more detail about how you go about doing text mining:

Start with syntactic analysis
Use Named Entity Recognition to extract terms/semantic entities
Use parsers to extract other aspects – events, sentiments etc.

All this allows the creation of annotations – semantic metatdata.

Some examples of text mining applications:

Sophia suggests we should be integrating ‘Language Technology’ into open and common e-research infrastructure to enable the use of text mining tools on the content. See U-Compare tool from NaCTeM – http://www.nactem.ac.uk/u-compare.php

Q & A

Q: (David Flanders) If I was a repository manager which tool would you recommend I play with first?

A: All of them! Need to work out what you want to do and pick appropriate tool

Overdue Ideas

Ideas linking Libraries, Computing, E-learning, and anything else that springs to mind.

SORT – Advanced text mining tools and resources for knowledge discovery

Leave a Reply