ABSTRACT
With the emergence of smart TVs, set-top boxes and public information screens over the last few years, there is an increasing demand to no longer use these appliances only for passive output. These devices can also be used to do text-based web search as well as other tasks which require some form of text input. However, the design of text entry interfaces for efficient input on such appliances represents a major challenge. With current virtual keyboard solutions we only achieve an average text input rate of 5.79 words per minute (WPM) while the average typing speed on a traditional keyboard is 38 WPM. Furthermore, so-called controller-free appliances such as Samsung's Smart TV or Microsoft's Xbox Kinect result in even lower average text input rates. We present SpeeG2, a multimodal text entry solution combining speech recognition with gesture-based error correction. Four innovative prototypes for the efficient controller-free text entry have been developed and evaluated. A quantitative evaluation of our SpeeG2 text entry solution revealed that the best of our four prototypes achieves an average input rate of 21.04 WPM (without errors), outperforming current state-of-the-art solutions for controller-free text input.
Supplemental Material
- J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, and N. Dahlgren. TIMIT Acoustic Phonetic Continuous Speech Corpus, 1993.Google Scholar
- L. Hoste, B. Dumas, and B. Signer. SpeeG: A Multimodal Speech- and Gesture-based Text Input Solution. In Proceedings of AVI 2012, 11th International Working Conference on Advanced Visual Interfaces, pages 156--163, Naples, Italy, May 2012. Google ScholarDigital Library
- C.-M. Karat, C. Halverson, D. Horn, and J. Karat. Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In Proceedings of CHI 1999, ACM Conference on Human Factors in Computing Systems, pages 568--575, Pittsburgh, USA, May 1999. Google ScholarDigital Library
- P. O. Kristensson, J. Clawson, M. Dunlop, P. Isokoski, B. Roark, K. Vertanen, A. Waller, and J. Wobbrock. Designing and Evaluating Text Entry Methods. In Proceedings of CHI 2012, ACM Conference on Human Factors in Computing Systems, pages 2747--2750, Austin, USA, May 2012. Google ScholarDigital Library
- I. MacKenzie and R. Soukoreff. Phrase Sets for Evaluating Text Entry Techniques. In Extended Abstracts of CHI 2003, ACM Conference on Human Factors in Computing Systems, pages 754--755, Fort Lauderdale, USA, April 2003. Google ScholarDigital Library
- M. R. Morris. Web on the Wall: Insights From a Multimodal Interaction Elicitation Study. In Proceedings of ITS 2012, International Conference on Interactive Tabletops and Surfaces, pages 95--104, Cambridge, USA, November 2012. Google ScholarDigital Library
- A. Schick, D. Morlock, C. Amma, T. Schultz, and R. Stiefelhagen. Vision-based Handwriting Recognition for Unrestricted Text Input in Mid-Air. In Proceedings of ICMI 2012, 14th International Conference on Multimodal Interaction, pages 217--220, Santa Monica, USA, October 2012. Google ScholarDigital Library
- K. C. Sim. Speak-As-You-Swipe (SAYS): A Multimodal Interface Combining Speech and Gesture Keyboard Synchronously for Continuous Mobile Text Entry. In Proceedings of ICMI 2012, 14th International Conference on Multimodal Interaction, pages 555--560, Santa Monica, USA, October 2012. Google ScholarDigital Library
- C. Szentgyorgyi and E. Lank. Five-Key Text Input Using Rhythmic Mappings. In Proceedings of ICMI 2007, 9th International Conference on Multimodal Interfaces, pages 118--121, Nagoya, Japan, November 2007. Google ScholarDigital Library
- K. Vertanen and P. Kristensson. Parakeet: A Continuous Speech Recognition System for Mobile Touch-Screen Devices. In Proceedings of IUI 2009, 14th International Conference on Intelligent User Interfaces, pages 237--246, Sanibel Island, USA, February 2009. Google ScholarDigital Library
- K. Vertanen and D. MacKay. Speech Dasher: Fast Writing Using Speech and Gaze. In Proceedings of CHI 2010, Annual Conference on Human Factors in Computing Systems, pages 595--598, Atlanta, USA, April 2010. Google ScholarDigital Library
- D. J. Ward, A. F. Blackwell, and D. J. C. MacKay. Dasher -- A Data Entry Interface Using Continuous Gestures and Language Models. In Proceedings of UIST 2000, 13th Annual ACM Symposium on User Interface Software and Technology, pages 129--137, San Diego, USA, November 2000. Google ScholarDigital Library
- A. D. Wilson and M. Agrawala. Text Entry Using a Dual Joystick Game Controller. In Proceedings of CHI 2006, ACM Conference on Human Factors in Computing Systems, pages 475--478, Montréal, Canada, April 2006. Google ScholarDigital Library
- J. Yuan, M. Liberman, and C. Cieri. Towards an Integrated Understanding of Speaking Rate in Conversation. In Proceedings of Interspeech 2006, 9th International Conference on Spoken Language Processing, pages 541--544, Pittsburgh, USA, September 2006.Google Scholar
Index Terms
- SpeeG2: a speech- and gesture-based interface for efficient controller-free text input
Recommendations
Reviewing Speech Input with Audio: Differences between Blind and Sighted Users
Speech input is a primary method of interaction for blind mobile device users, yet the process of dictating and reviewing recognized text through audio only (i.e., without access to visual feedback) has received little attention. A recent study found ...
Efficient Speech-Recognition Error-Correction Interface for Japanese Text Entry on Smartwatches
MobileHCI '19: Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and ServicesWe propose an efficient speech-recognition error-correction interface for Japanese text entry on smart-watches. Although the accuracy of automatic speech recognition (ASR) has significantly improved, an interface for text modification is still ...
Empirical study of a vision-based depth-sensitive human-computer interaction system
APCHI '12: Proceedings of the 10th asia pacific conference on Computer human interactionThis paper proposes the results of a user study on vision-based depth-sensitive input system for performing typical desktop tasks through arm gestures. We have developed a vision-based HCI prototype to be used for our comprehensive usability study. ...
Comments