skip to main content
10.1145/1502650.1502685acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Parakeet: a continuous speech recognition system for mobile touch-screen devices

Published:08 February 2009Publication History

ABSTRACT

We present Parakeet, a system for continuous speech recognition on mobile touch-screen devices. The design of Parakeet was guided by computational experiments and validated by a user study. Participants had an average text entry rate of 18 words-per-minute (WPM) while seated indoors and 13 WPM while walking outdoors. In an expert pilot study, we found that speech recognition has the potential to be a highly competitive mobile text entry method, particularly in an actual mobile setting where users are walking around while entering text.

References

  1. Accot, J. and Zhai, S. More than dotting the i's -- foundations for crossing-based interfaces. Proc. CHI 2002, ACM Press (2002), 73--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bisani, M. and Ney, H. Bootstrap estimates for confidence intervals in ASR performance evaluation. Proc. ICASSP 2004, IEEE Press (2004), 409--412.Google ScholarGoogle ScholarCross RefCross Ref
  3. Buxton, W. Chunking and phrasing and the design of human-computer dialogues. Proc. IFIP World Computer Congress 1986. IFIP (1986), 475--480.Google ScholarGoogle Scholar
  4. Cohen, J. Embedded speech recognition applications in mobile phones: status, trends and challenges. Proc. ICASSP 2008, IEEE Press (2008), 5352--5355.Google ScholarGoogle ScholarCross RefCross Ref
  5. Crossan, A., Murray-Smith, R., Brewster, S., Kelly, J. and Musizza, B. Gait phase effects in mobile interaction. Ext. Abstracts CHI 2005, ACM Press (2005), 1312--1315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Darragh, J.J., Witten, I.H. and James, M.L. The reactive keyboard: a predictive typing aid. IEEE Computer 23, 11 (1990), 41--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fitts, P. The information capacity in the human motor system in controlling the amplitude in movement. J. Experimental Psychology 47 (1954), 381--391.Google ScholarGoogle ScholarCross RefCross Ref
  8. Goodman, J., Venolia, G., Steury, K. and Parker, C. Language modeling for soft keyboards. Proc. AAAI 2002, AAAI Press (2002), 419--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hakkani-Tür, D., Béchet, F., Riccardi, G. and Tur, G. Beyond ASR 1-best: using word confusion networks in spoken language understanding. J. Computer Speech and Language 20, 4 (2006), 495--514.Google ScholarGoogle Scholar
  10. Hetherington, I.L. PocketSUMMIT: small footprint continuous speech recognition. Proc. ICSLP 2007, ISCA (2007), 1465--1468.Google ScholarGoogle Scholar
  11. Huggins-Daines, D., Kumar, M., Chan, A., Black, A.W., Ravishankar, M. and Rudnicky, A.I. PocketSphinx: a free real-time continuous speech recognition system for hand-held devices. Proc. ICASSP 2006, IEEE Press (2006), 185--188.Google ScholarGoogle ScholarCross RefCross Ref
  12. Karat, C.M., Halverson, C., Horn, D. and Karat, J. Patterns of entry and correction in large vocabulary speech recognition systems. Proc. CHI 1999, ACM Press (1999), 568--575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Karlson, A.K., Bederson, B.B. and Contreras-Vidal, J.L. Understanding one-handed use of mobile devices. In Lumsden, J. (Ed.) Handbook of Research on User Interface Design and Evaluation for Mobile Technology. Idea Group (2008), 86--100.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kurihara, K., Goto, M., Ogata, J. and Igarashi, T. Speech Pen: Predictive Handwriting Based on Ambient Multimodal Recognition. Proc. CHI 2006, ACM Press (2006), 851--860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kristensson, P.O. and Zhai, S. Relaxing stylus typing precision by geometric pattern matching. Proc. IUI 2005, ACM Press (2005), 151--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kristensson, P.O. and Zhai, S. Improving word-recognizers using an interactive lexicon with active and passive words. Proc. IUI 2008, ACM Press (2008), 353--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mangu, L., Brill E. and Stolcke A. Finding consensus in speech recognition: word error minimization and other applications of confusion networks. J. Computer Speech and Language 14, 4 (2000), 373--400.Google ScholarGoogle Scholar
  18. Ogata, J. and Goto, M. Speech repair: quick error correction just by using selection operation for speech input interfaces. Proc. ICSLP 2005, ISCA (2005), 133--136.Google ScholarGoogle Scholar
  19. Oulasvirta, A., Tamminen, S., Roto, V. and Kuorelahti, J. Interaction in 4-second bursts: the fragmented nature of attentional resources in mobile HCI. Proc. CHI 2005, ACM Press (2005), 919--927. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Oviatt, S. Cohen, P., Wu, L., Vergo, J., Duncan, L, Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J. and Ferro, D. Designing the user interface for multimodal speech and pen--based gesture applications: state-of-the-art systems and future research directions. Human-Computer Interaction 15 (2000), 263--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Price, K.J., Lin, M., Feng, J., Goldman, R., Sears, A. and Jacko, J. Motion does matter: an examination of speech-based text entry on the move. Universal Access in the Information Society 4 (2006), 246--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Rosenbaum, D.A. Human Motor Control. Academic Press (1991).Google ScholarGoogle Scholar
  23. Shneiderman, B. The limits of speech recognition. Communications of the ACM 43, 9 (2000), 63--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Stolcke, A. Entropy-based Pruning of Backoff Language Models. Proc. DARPA Broadcast News Transcription and Understanding Workshop, DARPA (1998), 270--284.Google ScholarGoogle Scholar
  25. Suhm, B., Myers, B. and Waibel, A. Multimodal error correction for speech user interfaces. ACM TOCHI 8, 1 (2001), 60--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Vertanen, K. Efficient computer interfaces using continuous gestures, language models, and speech M.Phil. thesis. University of Cambridge, United Kingdom (2004).Google ScholarGoogle Scholar
  27. Vertanen, K. Baseline WSJ acoustic models for HTK and Sphinx: training recipes and recognition experiments. Technical report, University of Cambridge, United Kingdom (2006).Google ScholarGoogle Scholar
  28. Vertanen, K. and Kristensson, P.O. On the benefits of confidence visualization in speech recognition. Proc. CHI 2008, ACM Press (2008), 1497--1500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Weng, F., Stolcke, A. and Sankar, A. Efficient lattice representation and generation. Proc. ICSLP 1999, ICSA (1999), 1251--1254.Google ScholarGoogle Scholar
  30. Wobbrock, J.O., Chau, D.H. and Myers, B.A. An alternative to push, press, and tap-tap-tap: gesturing on an isometric joystick for mobile phone text entry. Proc. CHI 2007, ACM Press (2007), 667--676. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Parakeet: a continuous speech recognition system for mobile touch-screen devices

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        IUI '09: Proceedings of the 14th international conference on Intelligent user interfaces
        February 2009
        522 pages
        ISBN:9781605581682
        DOI:10.1145/1502650

        Copyright © 2009 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 February 2009

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate746of2,811submissions,27%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader