Working with Visionaries on the Frontlines of Scientific Progress Worldwide
Nuance Foundation Grants

Domain specific knowledge extraction from unstructured text with or without semantic databases

Description: Université de Montreal RALI

 

RALI is one of the largest university NLP (natural language processing) labs in Canada. RALI’s team includes computer scientists and linguists with considerable experience in NLP.

Project: Domain specific knowledge extraction from unstructured text with or without semantic databases

Philippe Langlais

The project is located in the domain of automatically learning huge numbers of facts (like "Shakespeare is an author" or "Shakespeare wrote Romeo and Juliet") from unstructured text, and will try to improve the current state of the art in three direction. Firstly, it will deal with cleaning up such automatically learned facts by applying linguistic constraints and semantic knowledge to eliminate ill formed facts As a subquestion of this, it will be investigated if parsers can help with cleaning up ill-formed facts, or if parsers are too complex and/or slow for this to work on huge data bases. Secondly, the project will attempt to infer the domain of texts from which facts are learned and then apply that knowledge to improve the quality of the facts. Thirdly, the project will apply a model of informativeness to improve iterative approaches to learning facts (the system builds up the fact base by iterating over the source texts several times). It is planned to use two text data bases as starting point, Wikipedia and Clueweb09, a huge database with 1 billion web pages created for developing and testing IR (Information Retrieval) systems.

Programs & Grants

100 years ago nobody would have imagined that it may make sense to talk to machines. Today, in the days of speech recognition and speech synthesis to be found in cars, computers, phones and many other devices this is already normal. But it doesn't stop there.

Learn More