Parsing
Parsing
Parsing is a process of "using a grammar to assign a syntactic analysis to a string of words, a lattice of word hypotheses output by a speech recognizer" (Carroll, 2003, p. 233). In MTK, we use two types of grammar: constituency and dependency.
Constituency grammar
The fundamental idea of constituency is that groups of words form a single unit or phrase, called constituent (Jurafsky & Martin, 2000).
Constituency grammar describes the syntactical structure of the sentences in terms of phrasal hierarchies.
Dependency grammar
Dependency grammars focus on the direct relations between words in a particular sentence.
Parsing and formal languages
The phrase-based approach identifies phrases and structural categories in a given sentence. Analysing the sentence structure through the lens of a constituency grammar, we might be able to extract the relevant information of the phrase-boundaries helping in identification of concepts.
Dependency grammar, on the other hand, seems to be significant in identifying the relationshipsParsing between concepts and attributes of a particular concept. The reason is its ability to discover head-based relations (e.g. verb as a head), functional categories (e.g. subject, direct object, complement of a preposition, and others).
SBVR
In the context of natural language, the core items important for SBVR are the verb and its relation to the subject/actor and object. Identifying the verb using a constituency approach is possible. However, some cases such as passive constructions might cause problems. Furthermore, identifying the correct subject and object often fails with constituency grammar, when the sentences are longer or the subject appears after the verb. Dependency grammar focuses on the verb identication and the dependencies between different parts of the sentence. In the MTK, we built an interface, that uses the results produced by the Dependency Grammar (Stanford Parser) and extracts verbs, subjects, and objects that are in some relation to this verb (= head word). First tests have shown that even the passive constructions such
as "The products have been bought by the company" have been processed correctly.