skip to main content
10.1145/2522848.2522861acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

SpeeG2: a speech- and gesture-based interface for efficient controller-free text input

Published:09 December 2013Publication History

ABSTRACT

With the emergence of smart TVs, set-top boxes and public information screens over the last few years, there is an increasing demand to no longer use these appliances only for passive output. These devices can also be used to do text-based web search as well as other tasks which require some form of text input. However, the design of text entry interfaces for efficient input on such appliances represents a major challenge. With current virtual keyboard solutions we only achieve an average text input rate of 5.79 words per minute (WPM) while the average typing speed on a traditional keyboard is 38 WPM. Furthermore, so-called controller-free appliances such as Samsung's Smart TV or Microsoft's Xbox Kinect result in even lower average text input rates. We present SpeeG2, a multimodal text entry solution combining speech recognition with gesture-based error correction. Four innovative prototypes for the efficient controller-free text entry have been developed and evaluated. A quantitative evaluation of our SpeeG2 text entry solution revealed that the best of our four prototypes achieves an average input rate of 21.04 WPM (without errors), outperforming current state-of-the-art solutions for controller-free text input.

Skip Supplemental Material Section

Supplemental Material

icmi147.mp4

mp4

47.7 MB

References

  1. J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, and N. Dahlgren. TIMIT Acoustic Phonetic Continuous Speech Corpus, 1993.Google ScholarGoogle Scholar
  2. L. Hoste, B. Dumas, and B. Signer. SpeeG: A Multimodal Speech- and Gesture-based Text Input Solution. In Proceedings of AVI 2012, 11th International Working Conference on Advanced Visual Interfaces, pages 156--163, Naples, Italy, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C.-M. Karat, C. Halverson, D. Horn, and J. Karat. Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In Proceedings of CHI 1999, ACM Conference on Human Factors in Computing Systems, pages 568--575, Pittsburgh, USA, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. O. Kristensson, J. Clawson, M. Dunlop, P. Isokoski, B. Roark, K. Vertanen, A. Waller, and J. Wobbrock. Designing and Evaluating Text Entry Methods. In Proceedings of CHI 2012, ACM Conference on Human Factors in Computing Systems, pages 2747--2750, Austin, USA, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. MacKenzie and R. Soukoreff. Phrase Sets for Evaluating Text Entry Techniques. In Extended Abstracts of CHI 2003, ACM Conference on Human Factors in Computing Systems, pages 754--755, Fort Lauderdale, USA, April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. R. Morris. Web on the Wall: Insights From a Multimodal Interaction Elicitation Study. In Proceedings of ITS 2012, International Conference on Interactive Tabletops and Surfaces, pages 95--104, Cambridge, USA, November 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Schick, D. Morlock, C. Amma, T. Schultz, and R. Stiefelhagen. Vision-based Handwriting Recognition for Unrestricted Text Input in Mid-Air. In Proceedings of ICMI 2012, 14th International Conference on Multimodal Interaction, pages 217--220, Santa Monica, USA, October 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. C. Sim. Speak-As-You-Swipe (SAYS): A Multimodal Interface Combining Speech and Gesture Keyboard Synchronously for Continuous Mobile Text Entry. In Proceedings of ICMI 2012, 14th International Conference on Multimodal Interaction, pages 555--560, Santa Monica, USA, October 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Szentgyorgyi and E. Lank. Five-Key Text Input Using Rhythmic Mappings. In Proceedings of ICMI 2007, 9th International Conference on Multimodal Interfaces, pages 118--121, Nagoya, Japan, November 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Vertanen and P. Kristensson. Parakeet: A Continuous Speech Recognition System for Mobile Touch-Screen Devices. In Proceedings of IUI 2009, 14th International Conference on Intelligent User Interfaces, pages 237--246, Sanibel Island, USA, February 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Vertanen and D. MacKay. Speech Dasher: Fast Writing Using Speech and Gaze. In Proceedings of CHI 2010, Annual Conference on Human Factors in Computing Systems, pages 595--598, Atlanta, USA, April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. J. Ward, A. F. Blackwell, and D. J. C. MacKay. Dasher -- A Data Entry Interface Using Continuous Gestures and Language Models. In Proceedings of UIST 2000, 13th Annual ACM Symposium on User Interface Software and Technology, pages 129--137, San Diego, USA, November 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. D. Wilson and M. Agrawala. Text Entry Using a Dual Joystick Game Controller. In Proceedings of CHI 2006, ACM Conference on Human Factors in Computing Systems, pages 475--478, Montréal, Canada, April 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Yuan, M. Liberman, and C. Cieri. Towards an Integrated Understanding of Speaking Rate in Conversation. In Proceedings of Interspeech 2006, 9th International Conference on Spoken Language Processing, pages 541--544, Pittsburgh, USA, September 2006.Google ScholarGoogle Scholar

Index Terms

  1. SpeeG2: a speech- and gesture-based interface for efficient controller-free text input

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction
      December 2013
      630 pages
      ISBN:9781450321297
      DOI:10.1145/2522848

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 December 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      ICMI '13 Paper Acceptance Rate49of133submissions,37%Overall Acceptance Rate453of1,080submissions,42%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader