Working with Visionaries on the Frontlines of Scientific Progress Worldwide
Nuance Foundation Grants

Wikipedia-Based Named Entity Identification


Universität Stuttgart / Institute for Natural Language Processing:
The University of Stuttgart is divided up into ten faculties and is an educational institution in high demand worldwide with 56 undergraduate programs and 20 postgraduate programs. The University meets the requirements of the working world with the internationalization of its broad range of courses on offer, numerous on-line offers in teaching and further education and its intensive support for start-up companies.

In a world of growing floods of information, humans need the assistance of mechanized information processing tools. One important source of information consists of written or spoken text. The Institute for Natural Language Processing (IMS) carries out basic and applied research and trains students to create tools for automated processing of spoken and written language.

Project: Wikipedia-Based Named Entity Identification

Jonas Kuhn, Andrea Glaser

One of the central tasks of natural language understanding is to identify the real world entity that a given linguistic expression is referring to. Solving this problem has become more realistic with the advent of machine readable repositories such as Wikipedia which provide unique identifiers of a large number of real world entities. The goal of this project is to develop methods that map noun phrases in natural language text to such identifiers from Wikipedia. In many cases, reference of names is ambiguous. For example, "Michael Jackson" can refer to the famous singer, an actor, and other people. For any given usage of the names the correct entity should be mapped.

This task is more comprehensive than traditional named entity recognition since it includes unique reference identification. Information on the Wikipedia page (structured text, links to other articles, etc.) can be used to identify other occurrences of the entity like nominal noun phrases (e.g., "the singer") or pronouns (e.g., "he") and link all these occurrences in the text to the entity, that is, it also performs a form of co-reference resolution. All this information deduced from earlier occurrences of an entity e can then be used for other natural language understanding of subsequent sentences that contain e.


Glaser, A. and Kuhn, J. Exploring the utility of coreference chains, for improved identification of personal names. Proc. LREC, 2014

Programs & Grants

100 years ago nobody would have imagined that it may make sense to talk to machines. Today, in the days of speech recognition and speech synthesis to be found in cars, computers, phones and many other devices this is already normal. But it doesn't stop there.

Learn More