poster

SpeeG2: a speech- and gesture-based interface for efficient controller-free text input

Authors:
Lode Hoste

Vrije Universiteit Brussel, Brussels, Belgium

Vrije Universiteit Brussel, Brussels, Belgium
View Profile

,
Beat Signer

Vrije Universiteit Brussel, Brussels, Belgium

Vrije Universiteit Brussel, Brussels, Belgium
View Profile

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interactionDecember 2013Pages 213–220https://doi.org/10.1145/2522848.2522861

Published:09 December 2013Publication History

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

Pages 213–220

ABSTRACT

With the emergence of smart TVs, set-top boxes and public information screens over the last few years, there is an increasing demand to no longer use these appliances only for passive output. These devices can also be used to do text-based web search as well as other tasks which require some form of text input. However, the design of text entry interfaces for efficient input on such appliances represents a major challenge. With current virtual keyboard solutions we only achieve an average text input rate of 5.79 words per minute (WPM) while the average typing speed on a traditional keyboard is 38 WPM. Furthermore, so-called controller-free appliances such as Samsung's Smart TV or Microsoft's Xbox Kinect result in even lower average text input rates. We present SpeeG2, a multimodal text entry solution combining speech recognition with gesture-based error correction. Four innovative prototypes for the efficient controller-free text entry have been developed and evaluated. A quantitative evaluation of our SpeeG2 text entry solution revealed that the best of our four prototypes achieves an average input rate of 21.04 WPM (without errors), outperforming current state-of-the-art solutions for controller-free text input.

Supplemental Material

icmi147.mp4

mp4

47.7 MB

Download

References

J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, and N. Dahlgren. TIMIT Acoustic Phonetic Continuous Speech Corpus, 1993.Google Scholar
L. Hoste, B. Dumas, and B. Signer. SpeeG: A Multimodal Speech- and Gesture-based Text Input Solution. In Proceedings of AVI 2012, 11th International Working Conference on Advanced Visual Interfaces, pages 156--163, Naples, Italy, May 2012. Google ScholarDigital Library
C.-M. Karat, C. Halverson, D. Horn, and J. Karat. Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems. In Proceedings of CHI 1999, ACM Conference on Human Factors in Computing Systems, pages 568--575, Pittsburgh, USA, May 1999. Google ScholarDigital Library
P. O. Kristensson, J. Clawson, M. Dunlop, P. Isokoski, B. Roark, K. Vertanen, A. Waller, and J. Wobbrock. Designing and Evaluating Text Entry Methods. In Proceedings of CHI 2012, ACM Conference on Human Factors in Computing Systems, pages 2747--2750, Austin, USA, May 2012. Google ScholarDigital Library
I. MacKenzie and R. Soukoreff. Phrase Sets for Evaluating Text Entry Techniques. In Extended Abstracts of CHI 2003, ACM Conference on Human Factors in Computing Systems, pages 754--755, Fort Lauderdale, USA, April 2003. Google ScholarDigital Library
M. R. Morris. Web on the Wall: Insights From a Multimodal Interaction Elicitation Study. In Proceedings of ITS 2012, International Conference on Interactive Tabletops and Surfaces, pages 95--104, Cambridge, USA, November 2012. Google ScholarDigital Library
A. Schick, D. Morlock, C. Amma, T. Schultz, and R. Stiefelhagen. Vision-based Handwriting Recognition for Unrestricted Text Input in Mid-Air. In Proceedings of ICMI 2012, 14th International Conference on Multimodal Interaction, pages 217--220, Santa Monica, USA, October 2012. Google ScholarDigital Library
K. C. Sim. Speak-As-You-Swipe (SAYS): A Multimodal Interface Combining Speech and Gesture Keyboard Synchronously for Continuous Mobile Text Entry. In Proceedings of ICMI 2012, 14th International Conference on Multimodal Interaction, pages 555--560, Santa Monica, USA, October 2012. Google ScholarDigital Library
C. Szentgyorgyi and E. Lank. Five-Key Text Input Using Rhythmic Mappings. In Proceedings of ICMI 2007, 9th International Conference on Multimodal Interfaces, pages 118--121, Nagoya, Japan, November 2007. Google ScholarDigital Library
K. Vertanen and P. Kristensson. Parakeet: A Continuous Speech Recognition System for Mobile Touch-Screen Devices. In Proceedings of IUI 2009, 14th International Conference on Intelligent User Interfaces, pages 237--246, Sanibel Island, USA, February 2009. Google ScholarDigital Library
K. Vertanen and D. MacKay. Speech Dasher: Fast Writing Using Speech and Gaze. In Proceedings of CHI 2010, Annual Conference on Human Factors in Computing Systems, pages 595--598, Atlanta, USA, April 2010. Google ScholarDigital Library
D. J. Ward, A. F. Blackwell, and D. J. C. MacKay. Dasher -- A Data Entry Interface Using Continuous Gestures and Language Models. In Proceedings of UIST 2000, 13th Annual ACM Symposium on User Interface Software and Technology, pages 129--137, San Diego, USA, November 2000. Google ScholarDigital Library
A. D. Wilson and M. Agrawala. Text Entry Using a Dual Joystick Game Controller. In Proceedings of CHI 2006, ACM Conference on Human Factors in Computing Systems, pages 475--478, Montréal, Canada, April 2006. Google ScholarDigital Library
J. Yuan, M. Liberman, and C. Cieri. Towards an Integrated Understanding of Speaking Rate in Conversation. In Proceedings of Interspeech 2006, 9th International Conference on Spoken Language Processing, pages 541--544, Pittsburgh, USA, September 2006.Google Scholar

Index Terms

SpeeG2: a speech- and gesture-based interface for efficient controller-free text input
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Graphical user interfaces

Recommendations

Reviewing Speech Input with Audio: Differences between Blind and Sighted Users

Speech input is a primary method of interaction for blind mobile device users, yet the process of dictating and reviewing recognized text through audio only (i.e., without access to visual feedback) has received little attention. A recent study found ...
Read More
Efficient Speech-Recognition Error-Correction Interface for Japanese Text Entry on Smartwatches
MobileHCI '19: Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services

We propose an efficient speech-recognition error-correction interface for Japanese text entry on smart-watches. Although the accuracy of automatic speech recognition (ASR) has significantly improved, an interface for text modification is still ...
Read More
Empirical study of a vision-based depth-sensitive human-computer interaction system
APCHI '12: Proceedings of the 10th asia pacific conference on Computer human interaction

This paper proposes the results of a user study on vision-based depth-sensitive input system for performing typical desktop tasks through arm gestures. We have developed a vision-based HCI prototype to be used for our comprehensive usability study. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction
December 2013
630 pages
ISBN:9781450321297
DOI:10.1145/2522848
General Chairs:
Julien Epps
The University of New South Wales, Australia
,
Fang Chen
National ICT Australia, Australia
,
Sharon Oviatt
Incaa Designs, USA
,
Kenji Mase
Nagoya University, Japan
,
Program Chairs:
Andrew Sears
Rochester Institute of Technology, USA
,
Kristiina Jokinen
University of Helsinki, Finland
,
Björn Schuller
Technische Universität München, Germany
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 December 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
camera-based ui
gesture interaction
multimodal input
speech input
speeg2
text entry
Qualifiers
- poster
Conference

Acceptance Rates
ICMI '13 Paper Acceptance Rate49of133submissions,37%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 263
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SpeeG2: a speech- and gesture-based interface for efficient controller-free text input

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Reviewing Speech Input with Audio: Differences between Blind and Sighted Users

Efficient Speech-Recognition Error-Correction Interface for Japanese Text Entry on Smartwatches

Empirical study of a vision-based depth-sensitive human-computer interaction system