This research is directed towards the difficult challenges of performing language-based inferences for answering questions based on large collections of open-text documents. We focus on language-based inference since focusing on the information content in the language helps to avoid human bias, and more general common-sense knowledge may not be appropriate in certain question-answering situations
Asker parses large volumes of text and produce Abstract Knowledge Representations (AKRs) which permit many textual en-tailments and contradictions to be identified. The AKRs are placed in a repository and indexed to allow efficient retrieval against the AKRs of natural language queries. Retrieved representations are checked against the query AKR to determine textual entailments or contradictions between the question and possible answers, using a formally verifiable entailment calculus. Results of entailment and contradiction detection are used to classify and rank answers.
Scalable AKR repository Question answering from a large repository involves two steps: retrieving candidate texts that are likely to contain answers, and evaluating those answers. Most text repositories are indexed by the terms in the text, sometimes augmented by other information extracted from text patterns. A repository for open text indexed instead by semantically significant AKR properties has better precision and recall for candidates answers. The mapping from text to abstract knowledge representations (AKRs) was part of our two way bridge Aquaint 2 project. We automatically transform natural language source material and queries from free-text English into semantics and then AKR. We create a repository of all documents in a set of substantial corpora (e.g. Wikipedia articles) indexed by a subset of the AKR properties of the document sentences. We retrieve candidate answer AKRs that have features in common with the query AKR and pass them to the ECD algorithms for filtering and ranking. The repository scales in size without noticeable degradation in retrieval time.
Entailment and contradiction detection We have developed algorithms for correctly and efficiently detecting entailment and contradiction relations holding between AKRs for questions and AKRs for candidate answer texts. Our entailment and contradiction detection (ECD) module filters and ranks candidate answers found from our AKR repository. A hallmark of our computational approach to syntax, semantics, and AKR mapping has been our ability to manage ambiguity by combining all alternative interpretations (syntactic, semantic, and knowledge-oriented) into a single packed structure that can be further processed without the typically exponential cost of unpacking. Our ECD algorithms incorporate this same technology, so that we can efficiently determine all possible EC relations holding between an ambiguous query and ambiguous candidate answers texts.