research-article

Parakeet: a continuous speech recognition system for mobile touch-screen devices

Authors:
Keith Vertanen

University of Cambridge, Cambridge, United Kingdom

University of Cambridge, Cambridge, United Kingdom
View Profile

,
Per Ola Kristensson

University of Cambridge, Cambridge, United Kingdom

University of Cambridge, Cambridge, United Kingdom
View Profile

IUI '09: Proceedings of the 14th international conference on Intelligent user interfacesFebruary 2009Pages 237–246https://doi.org/10.1145/1502650.1502685

Published:08 February 2009Publication History

IUI '09: Proceedings of the 14th international conference on Intelligent user interfaces

Pages 237–246

ABSTRACT

We present Parakeet, a system for continuous speech recognition on mobile touch-screen devices. The design of Parakeet was guided by computational experiments and validated by a user study. Participants had an average text entry rate of 18 words-per-minute (WPM) while seated indoors and 13 WPM while walking outdoors. In an expert pilot study, we found that speech recognition has the potential to be a highly competitive mobile text entry method, particularly in an actual mobile setting where users are walking around while entering text.

References

Accot, J. and Zhai, S. More than dotting the i's -- foundations for crossing-based interfaces. Proc. CHI 2002, ACM Press (2002), 73--80. Google ScholarDigital Library
Bisani, M. and Ney, H. Bootstrap estimates for confidence intervals in ASR performance evaluation. Proc. ICASSP 2004, IEEE Press (2004), 409--412.Google ScholarCross Ref
Buxton, W. Chunking and phrasing and the design of human-computer dialogues. Proc. IFIP World Computer Congress 1986. IFIP (1986), 475--480.Google Scholar
Cohen, J. Embedded speech recognition applications in mobile phones: status, trends and challenges. Proc. ICASSP 2008, IEEE Press (2008), 5352--5355.Google ScholarCross Ref
Crossan, A., Murray-Smith, R., Brewster, S., Kelly, J. and Musizza, B. Gait phase effects in mobile interaction. Ext. Abstracts CHI 2005, ACM Press (2005), 1312--1315. Google ScholarDigital Library
Darragh, J.J., Witten, I.H. and James, M.L. The reactive keyboard: a predictive typing aid. IEEE Computer 23, 11 (1990), 41--49. Google ScholarDigital Library
Fitts, P. The information capacity in the human motor system in controlling the amplitude in movement. J. Experimental Psychology 47 (1954), 381--391.Google ScholarCross Ref
Goodman, J., Venolia, G., Steury, K. and Parker, C. Language modeling for soft keyboards. Proc. AAAI 2002, AAAI Press (2002), 419--424. Google ScholarDigital Library
Hakkani-Tür, D., Béchet, F., Riccardi, G. and Tur, G. Beyond ASR 1-best: using word confusion networks in spoken language understanding. J. Computer Speech and Language 20, 4 (2006), 495--514.Google Scholar
Hetherington, I.L. PocketSUMMIT: small footprint continuous speech recognition. Proc. ICSLP 2007, ISCA (2007), 1465--1468.Google Scholar
Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M. and Rudnicky, A.I. PocketSphinx: a free real-time continuous speech recognition system for hand-held devices. Proc. ICASSP 2006, IEEE Press (2006), 185--188.Google ScholarCross Ref
Karat, C.M., Halverson, C., Horn, D. and Karat, J. Patterns of entry and correction in large vocabulary speech recognition systems. Proc. CHI 1999, ACM Press (1999), 568--575. Google ScholarDigital Library
Karlson, A.K., Bederson, B.B. and Contreras-Vidal, J.L. Understanding one-handed use of mobile devices. In Lumsden, J. (Ed.) Handbook of Research on User Interface Design and Evaluation for Mobile Technology. Idea Group (2008), 86--100.Google ScholarCross Ref
Kurihara, K., Goto, M., Ogata, J. and Igarashi, T. Speech Pen: Predictive Handwriting Based on Ambient Multimodal Recognition. Proc. CHI 2006, ACM Press (2006), 851--860. Google ScholarDigital Library
Kristensson, P.O. and Zhai, S. Relaxing stylus typing precision by geometric pattern matching. Proc. IUI 2005, ACM Press (2005), 151--158. Google ScholarDigital Library
Kristensson, P.O. and Zhai, S. Improving word-recognizers using an interactive lexicon with active and passive words. Proc. IUI 2008, ACM Press (2008), 353--356. Google ScholarDigital Library
Mangu, L., Brill E. and Stolcke A. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. J. Computer Speech and Language 14, 4 (2000), 373--400.Google Scholar
Ogata, J. and Goto, M. Speech repair: quick error correction just by using selection operation for speech input interfaces. Proc. ICSLP 2005, ISCA (2005), 133--136.Google Scholar
Oulasvirta, A., Tamminen, S., Roto, V. and Kuorelahti, J. Interaction in 4-second bursts: the fragmented nature of attentional resources in mobile HCI. Proc. CHI 2005, ACM Press (2005), 919--927. Google ScholarDigital Library
Oviatt, S. Cohen, P., Wu, L., Vergo, J., Duncan, L, Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J. and Ferro, D. Designing the user interface for multimodal speech and pen--based gesture applications: state-of-the-art systems and future research directions. Human-Computer Interaction 15 (2000), 263--322. Google ScholarDigital Library
Price, K.J., Lin, M., Feng, J., Goldman, R., Sears, A. and Jacko, J. Motion does matter: an examination of speech-based text entry on the move. Universal Access in the Information Society 4 (2006), 246--257. Google ScholarDigital Library
Rosenbaum, D.A. Human Motor Control. Academic Press (1991).Google Scholar
Shneiderman, B. The limits of speech recognition. Communications of the ACM 43, 9 (2000), 63--65. Google ScholarDigital Library
Stolcke, A. Entropy-based Pruning of Backoff Language Models. Proc. DARPA Broadcast News Transcription and Understanding Workshop, DARPA (1998), 270--284.Google Scholar
Suhm, B., Myers, B. and Waibel, A. Multimodal error correction for speech user interfaces. ACM TOCHI 8, 1 (2001), 60--98. Google ScholarDigital Library
Vertanen, K. Efficient computer interfaces using continuous gestures, language models, and speech M.Phil. thesis. University of Cambridge, United Kingdom (2004).Google Scholar
Vertanen, K. Baseline WSJ acoustic models for HTK and Sphinx: training recipes and recognition experiments. Technical report, University of Cambridge, United Kingdom (2006).Google Scholar
Vertanen, K. and Kristensson, P.O. On the benefits of confidence visualization in speech recognition. Proc. CHI 2008, ACM Press (2008), 1497--1500. Google ScholarDigital Library
Weng, F., Stolcke, A. and Sankar, A. Efficient lattice representation and generation. Proc. ICSLP 1999, ICSA (1999), 1251--1254.Google Scholar
Wobbrock, J.O., Chau, D.H. and Myers, B.A. An alternative to push, press, and tap-tap-tap: gesturing on an isometric joystick for mobile phone text entry. Proc. CHI 2007, ACM Press (2007), 667--676. Google ScholarDigital Library

Index Terms

Parakeet: a continuous speech recognition system for mobile touch-screen devices
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Sound-based input / output
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction devices
      1. Sound-based input / output

Recommendations

Parakeet: a demonstration of speech recognition on a mobile touch-screen device
IUI '09: Proceedings of the 14th international conference on Intelligent user interfaces

We demonstrate Parakeet -- a continuous speech recognition system for mobile touch-screen devices. Parakeet's interface is designed to make correcting errors easy on a handheld device while on the move. Users correct errors using a touch-screen to ...
Read More
Efficient Speech-Recognition Error-Correction Interface for Japanese Text Entry on Smartwatches
MobileHCI '19: Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services

We propose an efficient speech-recognition error-correction interface for Japanese text entry on smart-watches. Although the accuracy of automatic speech recognition (ASR) has significantly improved, an interface for text modification is still ...
Read More
Robust Romanian language automatic speech recognizer based on multistyle training

This paper presents solutions for increasing environmental robustness of a Romanian language continuous speech recognizer, previously developed. All state-of-the-art automatic speech recognizers (ASR) are data-driven and rely heavily on huge speech data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
IUI '09: Proceedings of the 14th international conference on Intelligent user interfaces
February 2009
522 pages
ISBN:9781605581682
DOI:10.1145/1502650
General Chairs:
Cristina Conati
University of British Columbia, Canada
,
Mathias Bauer
mineway GmbH, Germany
,
Program Chairs:
Nuria Oliver
Telefonica Research, Spain
,
Dan Weld
University of Washington, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 February 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
continuous speech recognition
error correction
mobile text entry
predictive keyboard
speech input
text input
touch-screen interface
word confusion network
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate746of2,811submissions,27%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 721
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Parakeet: a continuous speech recognition system for mobile touch-screen devices

IUI '09: Proceedings of the 14th international conference on Intelligent user interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Parakeet: a demonstration of speech recognition on a mobile touch-screen device

Efficient Speech-Recognition Error-Correction Interface for Japanese Text Entry on Smartwatches

Robust Romanian language automatic speech recognizer based on multistyle training