ABSTRACT
We present Parakeet, a system for continuous speech recognition on mobile touch-screen devices. The design of Parakeet was guided by computational experiments and validated by a user study. Participants had an average text entry rate of 18 words-per-minute (WPM) while seated indoors and 13 WPM while walking outdoors. In an expert pilot study, we found that speech recognition has the potential to be a highly competitive mobile text entry method, particularly in an actual mobile setting where users are walking around while entering text.
- Accot, J. and Zhai, S. More than dotting the i's -- foundations for crossing-based interfaces. Proc. CHI 2002, ACM Press (2002), 73--80. Google ScholarDigital Library
- Bisani, M. and Ney, H. Bootstrap estimates for confidence intervals in ASR performance evaluation. Proc. ICASSP 2004, IEEE Press (2004), 409--412.Google ScholarCross Ref
- Buxton, W. Chunking and phrasing and the design of human-computer dialogues. Proc. IFIP World Computer Congress 1986. IFIP (1986), 475--480.Google Scholar
- Cohen, J. Embedded speech recognition applications in mobile phones: status, trends and challenges. Proc. ICASSP 2008, IEEE Press (2008), 5352--5355.Google ScholarCross Ref
- Crossan, A., Murray-Smith, R., Brewster, S., Kelly, J. and Musizza, B. Gait phase effects in mobile interaction. Ext. Abstracts CHI 2005, ACM Press (2005), 1312--1315. Google ScholarDigital Library
- Darragh, J.J., Witten, I.H. and James, M.L. The reactive keyboard: a predictive typing aid. IEEE Computer 23, 11 (1990), 41--49. Google ScholarDigital Library
- Fitts, P. The information capacity in the human motor system in controlling the amplitude in movement. J. Experimental Psychology 47 (1954), 381--391.Google ScholarCross Ref
- Goodman, J., Venolia, G., Steury, K. and Parker, C. Language modeling for soft keyboards. Proc. AAAI 2002, AAAI Press (2002), 419--424. Google ScholarDigital Library
- Hakkani-Tür, D., Béchet, F., Riccardi, G. and Tur, G. Beyond ASR 1-best: using word confusion networks in spoken language understanding. J. Computer Speech and Language 20, 4 (2006), 495--514.Google Scholar
- Hetherington, I.L. PocketSUMMIT: small footprint continuous speech recognition. Proc. ICSLP 2007, ISCA (2007), 1465--1468.Google Scholar
- Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M. and Rudnicky, A.I. PocketSphinx: a free real-time continuous speech recognition system for hand-held devices. Proc. ICASSP 2006, IEEE Press (2006), 185--188.Google ScholarCross Ref
- Karat, C.M., Halverson, C., Horn, D. and Karat, J. Patterns of entry and correction in large vocabulary speech recognition systems. Proc. CHI 1999, ACM Press (1999), 568--575. Google ScholarDigital Library
- Karlson, A.K., Bederson, B.B. and Contreras-Vidal, J.L. Understanding one-handed use of mobile devices. In Lumsden, J. (Ed.) Handbook of Research on User Interface Design and Evaluation for Mobile Technology. Idea Group (2008), 86--100.Google ScholarCross Ref
- Kurihara, K., Goto, M., Ogata, J. and Igarashi, T. Speech Pen: Predictive Handwriting Based on Ambient Multimodal Recognition. Proc. CHI 2006, ACM Press (2006), 851--860. Google ScholarDigital Library
- Kristensson, P.O. and Zhai, S. Relaxing stylus typing precision by geometric pattern matching. Proc. IUI 2005, ACM Press (2005), 151--158. Google ScholarDigital Library
- Kristensson, P.O. and Zhai, S. Improving word-recognizers using an interactive lexicon with active and passive words. Proc. IUI 2008, ACM Press (2008), 353--356. Google ScholarDigital Library
- Mangu, L., Brill E. and Stolcke A. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. J. Computer Speech and Language 14, 4 (2000), 373--400.Google Scholar
- Ogata, J. and Goto, M. Speech repair: quick error correction just by using selection operation for speech input interfaces. Proc. ICSLP 2005, ISCA (2005), 133--136.Google Scholar
- Oulasvirta, A., Tamminen, S., Roto, V. and Kuorelahti, J. Interaction in 4-second bursts: the fragmented nature of attentional resources in mobile HCI. Proc. CHI 2005, ACM Press (2005), 919--927. Google ScholarDigital Library
- Oviatt, S. Cohen, P., Wu, L., Vergo, J., Duncan, L, Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J. and Ferro, D. Designing the user interface for multimodal speech and pen--based gesture applications: state-of-the-art systems and future research directions. Human-Computer Interaction 15 (2000), 263--322. Google ScholarDigital Library
- Price, K.J., Lin, M., Feng, J., Goldman, R., Sears, A. and Jacko, J. Motion does matter: an examination of speech-based text entry on the move. Universal Access in the Information Society 4 (2006), 246--257. Google ScholarDigital Library
- Rosenbaum, D.A. Human Motor Control. Academic Press (1991).Google Scholar
- Shneiderman, B. The limits of speech recognition. Communications of the ACM 43, 9 (2000), 63--65. Google ScholarDigital Library
- Stolcke, A. Entropy-based Pruning of Backoff Language Models. Proc. DARPA Broadcast News Transcription and Understanding Workshop, DARPA (1998), 270--284.Google Scholar
- Suhm, B., Myers, B. and Waibel, A. Multimodal error correction for speech user interfaces. ACM TOCHI 8, 1 (2001), 60--98. Google ScholarDigital Library
- Vertanen, K. Efficient computer interfaces using continuous gestures, language models, and speech M.Phil. thesis. University of Cambridge, United Kingdom (2004).Google Scholar
- Vertanen, K. Baseline WSJ acoustic models for HTK and Sphinx: training recipes and recognition experiments. Technical report, University of Cambridge, United Kingdom (2006).Google Scholar
- Vertanen, K. and Kristensson, P.O. On the benefits of confidence visualization in speech recognition. Proc. CHI 2008, ACM Press (2008), 1497--1500. Google ScholarDigital Library
- Weng, F., Stolcke, A. and Sankar, A. Efficient lattice representation and generation. Proc. ICSLP 1999, ICSA (1999), 1251--1254.Google Scholar
- Wobbrock, J.O., Chau, D.H. and Myers, B.A. An alternative to push, press, and tap-tap-tap: gesturing on an isometric joystick for mobile phone text entry. Proc. CHI 2007, ACM Press (2007), 667--676. Google ScholarDigital Library
Index Terms
- Parakeet: a continuous speech recognition system for mobile touch-screen devices
Recommendations
Parakeet: a demonstration of speech recognition on a mobile touch-screen device
IUI '09: Proceedings of the 14th international conference on Intelligent user interfacesWe demonstrate Parakeet -- a continuous speech recognition system for mobile touch-screen devices. Parakeet's interface is designed to make correcting errors easy on a handheld device while on the move. Users correct errors using a touch-screen to ...
Efficient Speech-Recognition Error-Correction Interface for Japanese Text Entry on Smartwatches
MobileHCI '19: Proceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and ServicesWe propose an efficient speech-recognition error-correction interface for Japanese text entry on smart-watches. Although the accuracy of automatic speech recognition (ASR) has significantly improved, an interface for text modification is still ...
Robust Romanian language automatic speech recognizer based on multistyle training
This paper presents solutions for increasing environmental robustness of a Romanian language continuous speech recognizer, previously developed. All state-of-the-art automatic speech recognizers (ASR) are data-driven and rely heavily on huge speech data ...
Comments