Abstract

Abstract

The output of multimodal human-computer interfaces is what this thesis is concerned with. Rather than hard-coding graphical and spoken representations, methods are introduced that plan and realize coherent output, appropriate to the situation and the device. The generation system expects a mode- and language-independent representation, as it can be supplied by the dialogue management component of a dialogue system. The generator then assembles mode-specific rendering instructions simultaneously for each mode with the aid of a unification-based functional grammar. The approach proposed in this thesis abandons the canonical structure of pipelined planning and realization in natural language generation, in favor of hard constraints formulated in a grammar, and soft constraints that allow for the gradual adaptivity of the output. The grammar is constructed to ensure the coherence of output in different modalities, whose output is generated in a synchronized fashion rather than by separate, mode-specific generators. The soft constraints follow some of the Gricean maxims by incorporating two counteracting communicative goals: efficacy and efficiency. A fitness function encoding these goals takes into account situation- and user-specific factors, such as distractions in a single mode or the user's sensory impairments. The function leads to the selection of an appropriate output from the variety of potential outputs generated by the grammar. It is evaluated in a study with human subjects.

The thesis presents a unification based, hybrid grammar formalism which can combine pre-fabricated phrases and linguistically motivated grammar fragments, and an associated algorithm which integrates the formulation of grammars that lead to cross-modally coherent output. Methods are compared to efficiently implement a control strategy, combining hard and soft constraints as a constraint optimization problem.

The cross-modal coherence implemented by the grammar formalism is motivated by known phenomena, such as cross-modal priming, or alignment between interlocutors. To optimize discourse coherence, central ideas of Centering Theory are implemented using the grammar formalism.

Finally, novel methods and a ready-to-use implementation are introduced which allow user interface developers to inspect, maintain and extend grammars. The formalism and generation implementation is demonstrated with a grammar for a mobile, multimodal application, the Virtual Personal Assistant.

[Close window]