Language and Cognition Research

Multimodal User Interfaces and Natural Language Generation

EU Project: Flexible & Adaptive Spoken Language and Multimodal Interfaces (FASiL)

the virtual personal assistant (mockup)
Virtual Personal Assistant (Mockup).
Voice Input: "Show me this one!"

When humans communicate vis-a-vis, they employ a variety of channels to convey meaning. Speech is not necessarily the primary mode. It is supplemented or replaced by facial gestures, body posture, gestures (deictic, iconic). Can we use these channels to communicate with machines? How can various signals from the different channels (that a human computer user may employ) be integrated to form a unique meaning representation, i.e. a computer command? Can a computer automatically and adaptively display content in a coherent and concise fashion? These questions are part of the research work done in FASiL. The focus of this two-year project is to produce a conversational language engine, demonstrated in three languages: English, Swedish and Portuguese. Important objectives for the EU are to ensure that the end system will be useful both in terms of functional requirements as well as being inclusive to all citizens - including hard of hearing and visually impaired people.

Web site of the now-closed MIT Media Lab Europe

UI on the Fly: Generating a graphical user interface

generated output
Generated multimodal output,
Voice Output: "Send it now?"

generated output UI on the Fly is a technique that allows a computer to automatically generate multimodal user interfaces, in particular for small computers uch as cell phones or iPAQs. We enable these devices to engage in natural language conversation, using the touch-screen and voice out- and input at the same time. The output is tailored to the particular usage situation (in a restaurant, in the car, at home) as well as to the device and preferences of the user. The central system can thus remain blissfully agnostic as you switch from using a phone, to a PDA, to a computer, and back.

Technically, we formulate a hybrid natural language generation approach as a constraint optimization problem with hard and soft constraints. Multimodal Functional Unification Grammar is a formalism based on the unificiation of attribute-value-matrices. It enforces cross-modal coherence in the output. A graphical workbench allows us to maintain and debug grammars. The generation system has been evaluated positively with users, who judged it to be more efficient and showed a trend to perform better at a recall-task.

Detailed information about MUG Workbench and the system, and Open Source download
(Developed at MIT Media Lab Europe, 2002-2004, with E. M. Panttaja, F. Cummins, and others.)

David Reitter.
Hybrid planning and realization of coherent utterances for multimodal natural language dialogue systems.
Master's thesis, University College Dublin, 2004.
[ abstract | .pdf ]
Hans Dolfing, David Reitter, Luis Almeida, Nuno Beires, Michael Cody, Rui Gomes, Kerry Robinson, and Roman Zielinski.
The FASiL speech and multimodal corpora.
In Proc. INTER/EUROSPEECH 2005, 2005.
[ abstract | .pdf ]
David Reitter.
A development environment for multimodal functional unification generation grammars.
In Third International Conference on Natural Language Generation. ITRI Technical Report., 2004.
[ abstract | .pdf ]
Erin Panttaja, David Reitter, and Fred Cummins.
The evaluation of adaptable multimodal system outputs.
In Proceedings of the DUMAS Workshop on Robust and Adaptive Information Processing for Mobile Speech Interfaces, 2004.
[ abstract | .pdf ]
David Reitter, Erin Panttaja, and Fred Cummins.
UI on the fly: Generating a multimodal user interface.
In Proceedings of Human Language Technology conference 2004 / North American chapter of the Association for Computational Linguistics (HLT/NAACL-04), 2004.
[ abstract | .pdf ]