research-article

Speech dasher: fast writing using speech and gaze

Authors:
Keith Vertanen

University of Cambridge, Cambridge, United Kingdom

University of Cambridge, Cambridge, United Kingdom
View Profile

,
David J.C. MacKay

University of Cambridge, Cambridge, United Kingdom

University of Cambridge, Cambridge, United Kingdom
View Profile

CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsApril 2010Pages 595–598https://doi.org/10.1145/1753326.1753415

Published:10 April 2010Publication History

CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Pages 595–598

ABSTRACT

Speech Dasher allows writing using a combination of speech and a zooming interface. Users first speak what they want to write and then they navigate through the space of recognition hypotheses to correct any errors. Speech Dasher's model combines information from a speech recognizer, from the user, and from a letter-based language model. This allows fast writing of anything predicted by the recognizer while also providing seamless fallback to letter-by-letter spelling for words not in the recognizer's predictions. In a formative user study, expert users wrote at 40 (corrected) words per minute. They did this despite a recognition word error rate of 22%. Furthermore, they did this using only speech and the direction of their gaze (obtained via an eye tracker).

Supplemental Material

p595.wmv

wmv

14.6 MB

Download

References

D. Huggins-Daines, M. Kumar, A. Chan, A. W. Black, M. Ravishankar, and A. I. Rudnicky. PocketSphinx: A free, real-time continuous speech recognition system for hand-held devices. In Proc. of ICASSP, 185--188, 2006.Google ScholarCross Ref
C.-M. Karat, C. Halverson, D. Horn, and J. Karat. Patterns of entry and correction in large vocabulary continuous speech recognition systems. In Proc. of CHI, 568--575, 1999. Google ScholarDigital Library
K. Larson and D. Mowatt. Speech error correction: The story of the alternates list. International Journal of Speech Technology, 183--194, 2003.Google ScholarCross Ref
S. Oviatt. Taming recognition errors with a multimodal interface. Comm. of the ACM, 43(9):45--51, 2000. Google ScholarDigital Library
B. Suhm, B. Myers, and A. Waibel. Multimodal error correction for speech user interfaces. ACM Transactions on Computer-Human Interaction, 8(1):60--98, 2001. Google ScholarDigital Library
D. J. Ward, A. F. Blackwell, and D. J. C. MacKay. Dasher - a data entry interface using continuous gestures and language models. In Proc. of UIST, 129--137, 2000. Google ScholarDigital Library
D. J. Ward and D. J. C. MacKay. Fast hands-free writing by gaze direction. Nature, 418(6900):838, 2002.Google ScholarCross Ref

Index Terms

Speech dasher: fast writing using speech and gaze
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Sound-based input / output
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Sound-based input / output

Recommendations

Speech dasher: a demonstration of text input using speech and approximate pointing
ASSETS '14: Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility

Speech Dasher is a novel text entry interface in which users first speak their desired text and then use the zooming interface Dasher to confirm and correct the recognition result. After several hours of practice, users wrote using Speech Dasher at 40 (...
Read More
Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Dysarthria is a motor speech disorder that causes inability to control and coordinate one or more articulators. This makes it difficult for a dysarthric speaker to utter certain speech sound units, thereby producing poorly articulated, slurred, and ...
Read More
Syllable-based automatic arabic speech recognition in noisy-telephone channel

The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
April 2010
2690 pages
ISBN:9781605589299
DOI:10.1145/1753326
General Chair:
Elizabeth Mynatt
Georgia Institute of Technology
,
Program Chairs:
Geraldine Fitzpatrick
Vienna University of Technology
,
Scott Hudson
Carnegie Mellon University
,
Keith Edwards
Georgia Tech
,
Tom Rodden
University of Nottingham
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
eye tracking
multimodal interfaces
speech recognition
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate6,199of26,314submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 28
  Total Citations
  View Citations
- 1,170
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Speech dasher: fast writing using speech and gaze

CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Speech dasher: a demonstration of text input using speech and approximate pointing

Speech-Input Speech-Output Communication for Dysarthric Speakers Using HMM-Based Speech Recognition and Adaptive Synthesis System

Syllable-based automatic arabic speech recognition in noisy-telephone channel