Systematic studies of modified vocalization: Speech production changes during a variation of metronomic speech in persons who do and do not stutter

https://doi.org/10.1016/j.jfludis.2011.03.003Get rights and content

Abstract

The most common way to induce fluency using rhythm requires persons who stutter to speak one syllable or one word to each beat of a metronome, but stuttering can also be eliminated when the stimulus is of a particular duration (e.g., 1 second [s]). The present study examined stuttering frequency, speech production changes, and speech naturalness during rhythmic speech that alternated 1 s of reading with 1 s of silence. A repeated-measures design was used to compare data obtained during a control reading condition and during rhythmic reading in 10 persons who stutter (PWS) and 10 normally fluent controls. Ratings for speech naturalness were also gathered from naïve listeners. Results showed that mean vowel duration increased significantly, and the percentage of short phonated intervals decreased significantly, for both groups from the control to the experimental condition. Mean phonated interval length increased significantly for the fluent controls. Mean speech naturalness ratings during the experimental condition were approximately “7” on a 1–9 scale (1 = highly natural; 9 = highly unnatural), and these ratings were significantly correlated with vowel duration and phonated intervals for PWS. The findings indicate that PWS may be altering vocal fold vibration duration to obtain fluency during this rhythmic speech style, and that vocal fold vibration duration may have an impact on speech naturalness during rhythmic speech. Future investigations should examine speech production changes and speech naturalness during variations of this rhythmic condition.

Educational objectives: The reader will be able to: (1) describe changes (from a control reading condition) in speech production variables when alternating between 1 s of reading and 1 s of silence, (2) describe which rhythmic conditions have been found to sound and feel the most natural, (3) describe methodological issues for studies about alterations in speech production variables during fluency-inducing conditions, and (4) describe which fluency-inducing conditions have been shown to involve a reduction in short phonated intervals.

Highlights

► Vowel duration increased during variation of metronomic speech. ► Percent of short phonated intervals reduced during variation of metronomic speech. ► Vowel duration correlated with speech naturalness in PWS. ► Mean phonated interval duration correlated with speech naturalness in PWS. ► Percent of short phonated intervals correlated with speech naturalness in PWS.

Introduction

Rhythmic speech, most often operationalized as producing one syllable or one word to each beat of a metronome, has been widely studied with persons who stutter (PWS). It typically results in stuttering frequency levels at or near zero (e.g., Andrews et al., 1982, Davidow et al., 2009). Several explanations have been put forth for the stuttering reductions during metronome stimulation, such as distraction (Barber, 1940, Johnson and Rosen, 1937), slowed speech (Johnson & Rosen, 1937), normalization of brain regions that may be abnormally activated or deficient (Alm, 2004), integration of the speech mechanism (Johnson & Rosen, 1937), or specifically that the metronome effect helps to properly time motoric patterns involved in respiration, phonation, and articulation (Van Riper, 1973), and Wingate's Modified Vocalization Hypothesis (Wingate, 1969, Wingate, 1970). In this latter hypothesis, Wingate essentially purported that stuttering may be due to irregularities in the rhythm of one's speech and that melodic and prosodic changes during several fluency-inducing conditions (FICs), including metronome stimulation, stabilize these irregularities through inducing continuity in phonation.

In addition to typical syllable- and word-based metronomic speech, other rhythmic patterns can be used to produce fluency in PWS. Jones and Azrin (1969), for example, studied four male participants (19–25 years old) who timed their speech to the duration of a vibrotactile rhythmic stimulus applied to the wrist. Stimulus-on durations included 0.1, 0.5, 1, 2, 3, 5, 10, 20, and 300 seconds (s); the time between pulses, or the stimulus-off duration, was 1 s in each condition. Group data demonstrated that percentages of words stuttered were at or near zero when participants spoke in the 0.1-s, 0.5-s, and 1-s conditions. After 1 s, the trend was an increase in stuttering frequency, although one participant did not begin to show an increase until after the 3-s condition. Finn and Ingham (1994) used the same conditions to examine how natural various metronome stimulation conditions felt and sounded to PWS. They also found that stuttering levels were at or near zero up to the 1-s condition and increased thereafter.

The rhythmic procedure used in the Jones and Azrin (1969) and Finn and Ingham (1994) studies has received little attention in the literature. This lack of attention is surprising, because both papers reported large reductions in stuttering, and because stimulus-on durations of 1–5 s have been rated as feeling and sounding more natural than shorter durations (0.1 s and 0.5 s) (Finn and Ingham, 1994, Jones and Azrin, 1969). In fact, when rating how their own speech sounded during a condition that alternated 1 s of speech with 1 s of silence, participants in the Finn and Ingham study gave ratings of approximately 5 (1 = highly natural, 9 = highly unnatural; Martin, Haroldson, & Triden, 1984), compared to ratings of approximately 6.5 during a condition that alternated 0.1 s of speech with 1 s of silence. Additionally, stuttering remained at a near-zero level. The improved naturalness with longer stimulus-on durations, coupled with the substantial stuttering reductions, has obvious treatment implications, which warrants further study of this rhythmic speaking style as a possible treatment agent, as suggested by Ingham, Sato, Finn, and Belknap (2001).

A somewhat similar pattern to the ones described in the Jones and Azrin (1969) and Finn and Ingham (1994) studies may have been used in older treatment studies, with investigators trying to “normalize” (likely referring to naturalness) the rhythmic pattern. The most notable example is Brady's (1971) Metronome-Conditioned Speech Retraining (MCSR), in which speakers went from speaking one syllable or word per beat (the tick of a metronome) at the start of treatment, to speaking “sequences of words to one beat” (Brady, 1971, p. 135) during the later stages of treatment. It is difficult to determine, however, if the sequences of words “normalized” the speech pattern (improved naturalness), as no data are provided in regards to this issue.

The similarity between the speech used in the later stages of MCSR and the rhythmic stimulation with longer stimulus-on durations (hereafter “RSLSD”) used in Jones and Azrin (1969) and Finn and Ingham (1994) is difficult to decipher. That is, no specific descriptions of the speech are provided and the length of the “sequence of words” during the running speech produced at the end of the treatment program is dependent on the metronome rate, information that is also not provided in MCSR studies (e.g., Brady, 1971, Öst et al., 1976). Brady does state that speakers typically start the program with the metronome set between 40 and 80 beats per minute (BPM) and that the rate is gradually increased later. Since any BPM rate over 60 would require the sequence of words to be less than 1 s (the beats would be 1 s apart at 60 BPM), it is unlikely that the speakers using MCSR were speaking for 1 s or longer, as was the case in the Jones and Azrin and Finn and Ingham studies.

To the best of our knowledge, the only long-term investigation of RSLSD was presented in the Jones and Azrin (1969) article. One of their participants reduced the percentage of stuttered words from 27.5% to 3.5% (within the clinic) while wearing a portable apparatus (stimulus-on duration was set at 2 s and stimulus-off duration was set at 1 s). The participant and his friends also stated that “his stuttering was almost absent and that his speech sounded natural” (Jones & Azrin, 1969, p. 228) while he used the apparatus. Unfortunately, no data are provided to support this beyond-clinic claim. The reported results are promising, however, and warrant further study of this speaking style.

Additionally, previous research suggests that continued investigation into the utility of rhythmic stimulation as a treatment agent may be fruitful. Andrews, Guitar, and Howie (1980) found metronome stimulation produced the next largest effect sizes for treatment outcome variables after prolonged speech-type procedures during their meta-analysis of treatment studies. Treatment studies using rhythm also received the best ratings for internal validity. It should be noted, however, that in a more recent review (Bothe, Davidow, Bramlett, & Ingham, 2006) only one metronome stimulation study (Öst et al., 1976) met the methodological criteria for inclusion in the review. A recent study also suggests that rhythmic stimulation may be a viable treatment for preschoolers. Trajkovski et al. (2009) found beyond-clinic stuttering reductions below 1 percent syllables stuttered (%SS) at the end of the establishment phase for three preschoolers using one syllable per beat metronomic speech. In addition, the children's clinicians and parents reported that speech sounded natural beyond the clinic; however, no data were presented to support this claim.

Regardless of the methodological limitations in previous metronome stimulation studies, the fact remains that speaking in time is a powerful fluency inducer, and brain imaging findings show that it can normalize brain regions associated with fluency (Stager, Jeffries, & Braun, 2003). The difficulty with using it as a long-term treatment agent seems to be in the unnatural-sounding speech it produces, especially when the one syllable per beat style is used (Finn & Ingham, 1994); that is, the metronome effect does not carry over (Brady, 1969, Greenberg, 1970), and deviation from the trained rhythmic pattern likely results in a reduction of the fluency-inducing benefits. Therefore, methods for retaining the imposed rhythm that also improve naturalness should be sought.

The present investigation was the first in a series of studies designed to investigate RSLSD, with the ultimate objective of this research line being a long-term treatment study using RSLSD as part of a treatment program. Whether or not RSLSD can be used as the foundation of a treatment program (like prolonged speech) will likely depend on whether or not the pattern can be shaped into natural-sounding speech. The present study is an initial step in resolving that issue, and explored changes in speech production variables during RSLSD to identify speech-motor adjustments that may be necessary for fluency induction and that may alter naturalness during this rhythmic speaking style. Changes in speech production variables have been found from a control condition to several FICs, including syllable- and word-based metronomic speech. Examples include an increase in vowel duration (e.g., Stager, Denman, & Ludlow, 1997), an increase in intraoral pressure rise time (Stager et al., 1997), a decrease in the percentage of short phonated intervals (an estimate of the duration of vocal fold vibration, as measured from the surface of the throat between breaks of 10 milliseconds (ms) or more; Davidow et al., 2009), a decrease in peak pressure (Stager et al., 1997), and a decrease in air flow rate (Hutchinson & Navarre, 1977) during metronomic speech in a group of PWS. These changes suggest that PWS may receive fluency benefits from these adjustments, although their necessity for fluency during several FICs has yet to be determined (Davidow, Bothe, Richardson, & Andreatta, 2010).

Alternating between 1 s of speaking and 1 s of silence (hereafter “Reading-1.0 s”) was chosen as the experimental condition due to its resulting in near zero stuttering in both Jones and Azrin (1969) and Finn and Ingham (1994); that is, there were notable increases in stuttering frequency with stimulus-on durations longer than 1 s (2 s, 3 s, and 5 s) in both of those studies. We sought to determine if speakers naturally (without instructions to do so) alter aspects of speech production during this variation of metronomic speech. If changes are found to occur naturally, we could determine their necessity to any improvements in fluency in a subsequent study. These initial investigations are critical first steps, as many authors have commented, for example, on how we do not know the most important parameters associated with improved fluency or naturalness for prolonged speech (e.g., Ingham et al., 1983, O’Brian et al., 2003). It seems important to identify these changes before examining the utility of a speaking style as a treatment agent.

In addition, naturalness ratings were gathered in the present study. Speech naturalness is most commonly measured in stuttering research using Martin et al.’s (1984) 1–9 naturalness scale. This rating scale has been subjected to substantial investigation as to its reliability and validity. Several studies have shown satisfactory levels of agreement (defined as within “1” scale point) within and between judges (e.g., Martin and Haroldson, 1992, Martin et al., 1984), although others have reported less compelling results (e.g., Onslow, Adams, & Ingham, 1992). Changes in naturalness coinciding with disruptions in the natural flow of speech (Martin et al., 1984), and changes in speech naturalness as speech production variables such as voice onset time, sentence duration (Metz, Schiavetti, & Sacco, 1990), and vowel duration (Schaeffer & Eichor, 2001) change have also been found. See Schiavetti and Metz (1997) for a comprehensive review of the reliability and validity of the scale.

To date, listeners’ naturalness ratings for RSLSD, using Martin et al.’s (1984) scale, have not been collected. Jones and Azrin (1969) asked listeners to simply mark “natural” or “unnatural,” and Finn and Ingham (1994) used only self-ratings of naturalness by PWS. The ratings provided by the speakers in the Finn and Ingham (1994) study may not be representative of a true listener's judgment, as there is evidence that PWS rate naturalness differently than other groups of listeners (see Teshima, Langevin, Hagler, & Kully, 2010, for a review). We also examined if speech production changes are associated with different naturalness ratings during the experimental condition. Finn and Ingham reported standard deviations of approximately 2 during Reading-1.0 s in their study (mean was approximately 5), suggesting a range of scores across the 1–9 scale. It would be informative to see if changes in speech production variables are associated with changes in listener-judged naturalness during Reading-1.0 s, as was the case with, for example, voice onset time during a picture description task (Metz et al., 1990) and vowel duration during the reading of phrases (Schaeffer & Eichor, 2001). If speech production changes are necessary (or helpful) for fluency during the experimental condition, we need to determine the impact of the changes on naturalness, in order to have a more complete understanding of the utility of Reading-1.0 s as a treatment agent.

The specific dependent variables examined were vowel duration, voice onset time, fundamental frequency, intraoral peak pressure, intraoral pressure rise time, maximum air flow, vowel midpoint air flow, and phonated intervals (Davidow et al., 2010). These variables were chosen because they represent a global view of the speech system; have changed during other FICs, including syllable- and word-based rhythmic speech (see Section 1.1); and/or can be manipulated. For example, phonated intervals have been specifically manipulated (Ingham et al., 2001a), vowel durations were altered when focusing on controlling syllable durations (Mallard & Westbrook, 1985), air flow measures can be manipulated using regulated breathing techniques, and, although not experimentally verified, it is reasonable to hypothesize that pressure values are altered when using light articulatory contacts (Stager, Denman, & Ludlow, 1997). Using variables that can be manipulated is important, because of the ultimate goal of identifying variables that can be altered to improve fluency and/or naturalness.

Although vowel duration and phonated intervals are both measures of voicing, they were both included for several reasons. First, the measurements are quite different. The phonated interval is a measurement across syllable and word boundaries whereas vowel duration is not. Second, phonated intervals can be registered without the inclusion of a vowel (voiced consonants); therefore, several phonated intervals may not include a vowel measurement. Third, increases in mean vowel duration do not always correspond with an increase in mean phonated interval duration (Davidow et al., 2010). Lastly, the MPI system provides the ability to obtain hundreds of phonated interval measurements within minutes resulting in the ability to display a distribution of all phonated intervals, whereas, to the best of the authors’ knowledge, this technology is not yet available for measures of vowel duration. This large quantity of phonated interval measurements allows for a more detailed and complete analysis of this dependent variable, resulting in findings that would not be possible if only the mean was examined.

Normally fluent speakers were also included in the present study. Data from normal speakers may provide information regarding the importance of speech-motor adjustments for fluency during Reading-1.0 s. If a speech production change is found only for the group of PWS (accompanied by a reduction in stuttering) and not for the group of control speakers, or a more significant change is found, that would provide further support for an association between the speech change and fluent speech during Reading-1.0 s.

The primary purpose of this study was to investigate speech production changes during a condition that alternates 1 s of speaking with 1 s of silence (Reading-1.0 s). We were also interested in listener-judged speech naturalness using Martin et al.’s (1984) 9-point scale and the association between naturalness and speech production variables during Reading-1.0 s. Therefore, the specific research questions were:

  • 1.

    Are there changes to speech production variables from a control condition to Reading-1.0 s for PWS and/or normally fluent controls?

  • 2.

    How will a group of naïve listeners rate the naturalness of the speech produced during Reading-1.0 s?

  • 3.

    Which speech production variables are associated with naturalness ratings during Reading-1.0 s?

Section snippets

Participants

Thirteen PWS and 11 normally fluent controls participated initially in this study, but several did not meet task compliance criteria (see Section 2.5). Thus, data from 10 participants were analyzed for each group. The final group of PWS consisted of 7 men and 3 women (mean age = 34.1 years; range = 18–51 years). The final group of controls consisted of 6 men and 4 women (mean age = 31.30 years; range = 20–63 years). One of the PWS had never received treatment; the remaining nine averaged 16.06 years

Reliability

Reliability data for all dependent variables, with the exception of %SS and articulation rate, are shown in Table 2. Two measures were used to assess reliability: mean difference (between judges or between measurement occasions) and the mean of percent deviation scores (difference scores for each token were divided by the primary judge's [or initial] measurement, and multiplied by 100). The average percent deviation for interjudge data across all dependent variables in Table 2 was 4.14. The

Discussion

The primary research question in the present study was whether or not there were changes in speech production variables from a control condition to a rhythmic condition which involved alternating between 1 s of reading and 1 s of silence (Reading-1.0 s) for PWS and/or normally fluent controls. The results showed that mean vowel durations increased significantly for both groups, mean phonated interval duration increased significantly for the control group, and the percentage of short (30–100 ms)

CONTINUING EDUCATION

Systematic studies of modified vocalization: Speech production changes during a variation of metronomic speech in persons who do and do not stutter

QUESTIONS

  • 1

    The findings of the present study suggest that PWS may be reducing (or decreasing) which aspect of speech production to obtain fluency when alternating between 1 s of reading and 1 s of silence:

    • a.

      Amount of air flow.

    • b.

      Pressure rise time.

    • c.

      Vocal fold duration vibration.

    • d.

      Percentage of short vocal fold vibration durations.

    • e.

      None of the above.

  • 2

    Which of the following conditions has been rated as sounding more natural than producing metronomic speech at one syllable per beat?

    • a.

      Rhythmic speech using 1–5-s

Acknowledgments

This research was supported in part by grants from the National Institutes of Health, the American Speech-Language-Hearing Association (Advancing Academic-Research Career Award), and Hofstra University awarded to the first author. We would like to thank the participants for their time and the judges for all of their hard work during data analysis.

Jason H. Davidow, Ph.D., is an Assistant Professor in the Speech-Language-Hearing Sciences Department at Hofstra University. His main research interests include stuttering treatment outcome and the measurement of speech production changes during fluency-inducing conditions in persons who stutter.

References (54)

  • S.V. Stager et al.

    Common features of fluency-evoking conditions studied in stuttering subjects and controls: an H215O PET study

    Journal of Fluency Disorders

    (2003)
  • S. Teshima et al.

    Post-treatment speech naturalness of comprehensive stuttering program clients and differences in ratings among listener groups

    Journal of Fluency Disorders

    (2010)
  • N. Trajkovski et al.

    Using syllable-timed speech to treat preschool children who stutter: A multiple baseline experiment

    Journal of Fluency Disorders

    (2009)
  • M.R. Adams et al.

    Vocal characteristics of normal speakers and stutterers during choral reading

    Journal of Speech and Hearing Research

    (1980)
  • G. Andrews et al.

    Meta-analysis of the effects of stuttering treatment

    Journal of Speech and Hearing Disorders

    (1980)
  • G. Andrews et al.

    Stuttering: Speech pattern characteristics under fluency-inducing conditions

    Journal of Speech and Hearing Research

    (1982)
  • V. Barber

    Studies in the psychology of stuttering, XVI: Rhythm as a distraction in stuttering

    Journal of Speech Disorders

    (1940)
  • S.M. Barlow et al.

    Speech aerodynamics using AEROWIN

  • A.K. Bothe et al.

    Stuttering treatment research 1970–2005. I. Systematic review incorporating trial quality assessment of behavioral, cognitive, and related approaches

    American Journal of Speech-Language Pathology

    (2006)
  • E.R. Brayton et al.

    Effects of noise and rhythmic stimulation on the speech of stutterers

    Journal of Speech and Hearing Research

    (1978)
  • J.H. Davidow et al.

    Measurement of phonated interval during four fluency-inducing conditions

    Journal of Speech, Language, and Hearing Research

    (2009)
  • J.H. Davidow et al.

    Systematic studies of modified vocalization: Effects of speech rate and instatement style during metronome stimulation

    Journal of Speech, Language, and Hearing Research

    (2010)
  • C.S. Davis

    Statistical methods for the analysis of repeated measurements

    (2002)
  • P. Finn et al.

    Stutterers’ self-ratings of how natural speech sounds and feels

    Journal of Speech and Hearing Research

    (1994)
  • M.L. Gow et al.

    Modifying electroglottograph-identified intervals of phonation: The effect on stuttering

    Journal of Speech and Hearing Research

    (1992)
  • Y. Hochberg et al.

    Multiple comparison procedures

    (1987)
  • R.J. Ingham et al.

    Stuttering measurement system (SMS)

    (2005)
  • Cited by (6)

    • Reliability of judgments of stuttering-related variables: The effect of language familiarity

      2021, Journal of Fluency Disorders
      Citation Excerpt :

      However, all LOA and SEM values were relatively high, indicating unacceptable absolute reliability. Although there is no known standard for acceptability for SPM in the stuttering literature, several judges trained to measure SPM in previous investigations (e.g., Davidow, Bothe, Andreatta, & Ye, 2009; Davidow, Bothe, & Ye, 2011) have consistently differed by less than 10 SPM from a first to second measurement occasion, and between each other. Using this 10 SPM barometer, the SEM values for SPM in English and Spanish are too large.

    • Temporal variability in sung productions of adolescents who stutter

      2016, Journal of Communication Disorders
      Citation Excerpt :

      Another possibility may be that melodic structure induces the singer to better control vocal fold tension and breathing which also could help patients with speech and voice disorders (Rinta & Welch, 2008; Sundberg, 1987). Others underscore the role of altered timing and regular rhythm for fluency enhancement (Davidow et al., 2011; Etchell et al., 2014; Howell, 2007). Audio-motor integration and learning may both be particularly stimulated through rhythmic production in solo as well as choral singing (Racette, Bard, & Peretz, 2006; Tierney & Kraus, 2014).

    • Vocal tone analysis for identification of stuttering levels based on Tamil syllable

      2019, International Journal of Recent Technology and Engineering

    Jason H. Davidow, Ph.D., is an Assistant Professor in the Speech-Language-Hearing Sciences Department at Hofstra University. His main research interests include stuttering treatment outcome and the measurement of speech production changes during fluency-inducing conditions in persons who stutter.

    Anne K. Bothe, Ph.D., CCC-SLP, is a Professor in The Communication Sciences and Special Education Department at the University of Georgia. Her research and writing focus on the intersection of measurement and treatment variables for stuttering.

    Jun Ye, Ph.D., is an Assistant Professor of Biostatistics at South Dakota State University (SDSU) and is also a statistical consultant at SDSU Agriculture Experiment Station. He also worked as a Biostatistician and Epidemiologist for the Department of Medicine at the Massachusetts General Hospital until August 2010.

    View full text