ABSTRACT
A study was conducted to evaluate user performance and satisfaction in completion of a set of text creation tasks using three commercially available continuous speech recognition systems. The study also compared user performance on similar tasks using keyboard input. One part of the study (Initial Use) involved 24 users who enrolled, received training and carried out practice tasks, and then completed a set of transcription and composition tasks in a single session. In a parallel effort (Extended Use), four researchers used speech recognition to carry out real work tasks over 10 sessions with each of the three speech recognition software products. This paper presents results from the Initial Use phase of the study along with some preliminary results from the Extended Use phase. We present details of the kinds of usability and system design problems likely in current systems and several common patterns of error correction that we found.
- 1.Clark, H. H. & Brennan, S. E. (1991). Grounding in communication. In J. Levine, L. B. Resnick, and S. D. Behrand (Eds.), Shared Cognition: Thinking as Social Practice. APA Books, Washington.Google Scholar
- 2.Danis, C. & Karat, J. (1995). Technology-driven design of speech recognition systems, in G. Olson and S. Schuon (eds.) Symposium on designing interactive systems. ACM: New York, 17-24. Google ScholarDigital Library
- 3.Gould, J. D., Conti, J., & Hovanyecz, T, (1983). Composing letters with a simulated listening typewriter. Communications of the ACM, 26, 4, 295- 308. Google ScholarDigital Library
- 4.Karat, J. (1995). Scenario use in the design of a speech recognition system. In J. Carroll (ed.) Scenario-based design. New York: Wiley. Google ScholarDigital Library
- 5.Kidd, A. (1994). The marks are on the knowledge worker, in Proceedings of CH1 '94 (Boston M.A, April 1994), ACM Press, 186-191. Google ScholarDigital Library
- 6.Lai, J. & Vergo, J. (1997). MedSpeak: Report Creation with Continuous Speech Recognition, in Proceedings of CHI '97 (Atlanta GA, March 1997), ACM Press, 431 - 438. Google ScholarDigital Library
- 7.Laurel, B. (1993). Computers as Theatre. Adison Wesley, New York. Google ScholarDigital Library
- 8.Ogozalek, V.Z., & Praag, J.V. (1986). Comparison of elderly and younger users on keyboard and voice input computer-based composition tasks, in Proceedings of CH1 '86, ACM Press, 205-211. Google ScholarDigital Library
- 9.Oviatt, S. (1995). Predicting spoken disfluencies during human-computer interaction. Computer Speech and Language, 9, 19-35.Google ScholarCross Ref
- 10.Yankelovich, N., Levow, G. A., & Marx, M. (1995). Designing SpeechActs: Issues in speech user interfaces, in Proceedings of CHI ~95 (Denver CO, May 1995), ACM Press, 369-376. Google ScholarDigital Library
Index Terms
- Patterns of entry and correction in large vocabulary continuous speech recognition systems
Recommendations
Large vocabulary continuous speech recognition for Urdu
FIT '10: Proceedings of the 8th International Conference on Frontiers of Information TechnologyThis paper presents the development of acoustic and language models for robust Urdu speech recognition using the CMU Sphinx Open Source Toolkit for speech recognition. Three models have been developed incrementally, with the addition of speech data of ...
Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition
In this paper, we investigate the combination of complementary acoustic feature streams in large-vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, Straight, in ...
Using tone information in Cantonese continuous speech recognition
In Chinese languages, tones carry important information at various linguistic levels. This research is based on the belief that tone information, if acquired accurately and utilized effectively, contributes to the automatic speech recognition of ...
Comments