Voice Evaluation and Therapy

Key Points

The voice is multidimensional, so voice assessment should be multidimensional.
Auditory, visual, and tactile perceptual examinations are key components of a voice evaluation.
It is important to characterize the patient's perception of the problem and the impact of the voice disorder on the patient's life. Several published scales can be used to report various aspects of handicap or quality of life.
Specific measurements can be used to better understand voice production and to document aspects of voice quality. A group of measures provides a more complete description of the voice than any measure alone.
Vocal function measurements are less “objective” than they sound, and their relation to voice quality is complicated and incompletely understood.
Voice therapy helps patients learn efficient and healthy technique to enhance voice quality and loudness, minimize voice-related handicap, improve communicative effectiveness, and restore vocal identity and health.

Voice is produced by interactions among the respiratory, laryngeal, and resonance systems. The speech-language pathologist assesses each system in addition to the overall speech output. This chapter describes the typical components of a voice evaluation and how to interpret the results, including patient-reported outcomes (PROs), perceptual evaluation, instrumental assessment of the voice-production mechanism and resultant sound wave, and diagnostic therapy. The chapter concludes with an introduction to voice therapy. Videostroboscopy is detailed in Chapter 54 .

Patient-Reported Outcome Instruments

Individuals have different requirements and expectations of their voices as well as different emotional responses to voice disorders; thus, the same degree of dysphonia will differentially limit participation in typical daily activities or alter a person's sense of self. As part of a complete voice evaluation, the effect of the voice problem on each individual's life should be assessed. PRO instruments are questionnaires completed by the patient that measure symptoms and participation as well as more complex constructs such as health, quality of life, or handicap. Several PRO questionnaires specific to voice have been published and, as with all PROs, they differ with respect to the rigor of their construction, validation process, psychometric properties, questionnaire length, and domains assessed. In general, PROs are an important component of a voice evaluation that provides information not captured elsewhere. The questionnaires can be used to guide discussion between health care providers and patients and to determine functional treatment goals. Several of the most commonly used scales are introduced in this section.

The Voice Handicap Index was designed to assess handicap, “a social, economic, or environmental disadvantage resulting from an impairment or disability.” The instrument consists of 30 statements that patients rate on a five-point equal-appearing interval scale that reflects frequency of occurrence. The total possible score is 120, with higher scores reflecting greater handicap. Although functional, physical, and emotional subscales can be reported, it has been suggested that the total score is more meaningful. Since its publication in 1997, the Voice Handicap Index has been widely used to show voice handicap in specific groups of patients, comparisons between handicap and vocal function measures, and change with treatment. It has been translated into numerous languages and has been used as a model for a shortened version, (VHI-10), a singing handicap index and its 10-item version, a child's version, a vocal fatigue index, and an aging voice index.

The Voice-Related Quality of Life is a 10-item scale divided into physical and social-emotional functioning subscales. Each item is scored on a five-point interval scale that reflects the severity of the problem. For each subscale and for the total score, 100 is the highest possible score, which reflects the highest quality of life. A child's version of this scale is also available. The Voice Symptom Scale is a psychometrically sound 30-item scale that represents physical impairment, emotional response, and related physical symptoms. Each question is rated using a five-point scale that represents frequency of occurrence.

Several other PRO scales are related to concerns of patients in an otolaryngology practice. These include the Reflux Symptom Index, Speech Handicap Index for patients with head and neck cancer, Cough Severity Index, and the Dyspnea Index.

Perceptual Assessment

Auditory Perceptual Assessment

Voice pitch, loudness, and quality are generally assessed during auditory perceptual evaluation. Pitch, pitch variability and range, loudness, and loudness variability and range are assessed in relation to the speaker's age, sex, gender, and the testing environment. Voice quality is more difficult to define and measure, although it is important because the ultimate goal of intervention is often to improve voice quality. Clinicians should consider cultural variability when determining whether quality is impaired. Traditionally, voice quality is rated as a series of pseudo-independent features (e.g., breathiness, roughness, and strain), but strong evidence suggests that the overall pattern is more than the sum of these features. Raters often disagree when rating voice quality; this is based on several factors, including difficulty isolating individual features or dimensions, differing and inconsistent internal representations of the parameters and severity, inadequate scale resolution, and the magnitude of the target parameter.

Perceptual rating tasks that control these factors—such as determining whether two stimuli are the same or different, rating the degree of dissimilarity of two productions, and adjusting a synthetic copy of a voice to match an original—lead to more reliable voice quality assessment. The method-of-adjustment task quantifies the perceived quality by the level of a particular feature (e.g., noise-to-signal ratio) that the rater sets to perceptually match the two stimuli. A sort-and-rate task can be used when comparisons of multiple stimuli are required. Listeners place icons that represent stimuli on a line so that items that sound most similar are placed closest to one another. The distances among stimuli are organized as dissimilarity matrices and are analyzed using multidimensional scaling.

Two rating scales that are used clinically are the GRBAS (grade, roughness, breathiness, asthenia, and strain) scale and the Consensus Auditory-Perceptual Evaluation–Voice (CAPE-V). The GRBAS scale is a simple rating tool by which the overall severity and five dimensions of voice quality are rated on four-point scales. The letter G represents the grade or overall quality, R is roughness, B is breathiness, A is asthenia (weakness), and S is strain. Each parameter is rated; the score is zero if no deficit is present, 1 if the deficit is mild, 2 if it is moderate, and 3 if the deficit is severe. No standard recommendation has been established for the type of utterances to use with GRBAS, so specific information about testing conditions should be documented.

The CAPE-V is a rating tool by which six core parameters—overall severity, roughness, breathiness, strain, pitch, and loudness—are rated by marking severity along a 100-mm line. These parameters may be supplemented with additional examiner-selected parameters. Each parameter is also flagged as occurring consistently or intermittently. The CAPE-V is scored based on two sustained vowels, six standard sentences, and at least 20 seconds of natural running speech. Recommendations about testing and recording environments are included in the reference publication.

Additional features assessed in the auditory perceptual evaluation include speech breathing, speech production, and resonance. Auditory perceptual correlates of speech breathing include length of breath group, average loudness, loudness variability, and inspiratory duration. These provide important information about lung volume expended, adequacy and consistency of alveolar pressure, and shape of the rib cage and abdominal walls during talking. Several other aspects of speech production, such as imprecise articulation, resonance, and prosody disturbances, can indicate structural or neurologic disorders that affect voice production. Resonance is described using the terms hypernasal, hyponasal, and “cul-de-sac.” Prosody refers to speech rate, presence of repeated or prolonged syllables, rushes of speech, intonation (i.e., monopitch or monoloudness), and stress patterns.

Visual Perceptual Assessment

The visual perceptual assessment refers to visible and physical aspects of voice production related to the etiology, maintenance, or result of dysphonia. General appearance features such as apparent age compared with chronologic age; height and weight; facial expression; skin, hair, and nails; personal hygiene; and dress provide information regarding underlying systemic disease, previous treatment, or emotional disorders. Inattention to personal hygiene and dress, for example, can be indicative of an emotional disorder or dementia.

Posture and musculoskeletal tension are thought to contribute to muscle tension dysphonia (MTD), which alters vocal pitch, loudness, and quality. Assessment involves evaluation of the alignment of the head, neck, torso, pelvis, and legs. Musculoskeletal tension is visible as abnormal extent of jaw motion, chin jut, neck extension, bulging of the neck muscles while talking, or raised shoulders.

Neurologic dysfunction is indicated by observations such as unsteadiness, asymmetry, rigidity, hesitation, slowness, weakness, incoordination, inconsistency, and extraneous movements. Weakness, asymmetry, and incoordination of the tongue, jaw, lips, or soft palate are especially noteworthy. The presence of focal dystonias, such as writer's cramp, blepharospasm, torticollis, and oromandibular dysphonia, usually leads the examiner to consider a neurologically based voice disorder, such as spasmodic dysphonia.

Physical dysmorphology, particularly syndromic features or evidence of orofacial difference or resection, should be noted for possible relation to a resonance or speech intelligibility deficit. Many systemic diseases that can affect the larynx and voice have visible physical symptoms, among which are rheumatoid arthritis, lupus, and Sjögren syndrome. For a detailed discussion of the visual perceptual examination, the reader is encouraged to refer to the works of Koschkee and Rammage.

Tactile Perceptual Assessment

Intrinsic and extrinsic laryngeal muscle imbalance is thought to be the primary characteristic of MTD. Manual examination of laryngeal musculoskeletal tension is a powerful technique to rapidly assess the contribution of muscle tension to the observed voice quality. Teasing apart muscle tension from other components of the dysphonia can help ensure proper diagnosis and management. Several protocols for manual examination have been recommended, and assessment typically includes palpation of the suprahyoid muscles, major horns of the hyoid bone, superior cornu and lateral aspects of the thyroid cartilage, thyrohyoid space, and anterior border of the sternocleidomastoid muscle. Suprahyoid tension and thyrohyoid space are assessed both at rest and during phonation, and lateral mobility is also assessed. Fig. 55.1 depicts this evaluation. Some authors also recommend palpating the thyrohyoid, cricothyroid, and pharyngolaryngeal muscles (inferior constrictor and posterior cricoarytenoid). Normal findings include palpable space between the hyoid bone and the superior border of the thyroid cartilage and mobility of the laryngeal complex. Findings indicative of excessive musculoskeletal tension include pain with palpation that is frequently more severe on one side, decrease or absence of thyrohyoid space at rest or with phonation, muscle “knots,” high carriage of the hyoid bone and thyroid cartilage, and difficulty lateralizing the larynx.

Fig. 55.1, Manual musculoskeletal tension evaluation.

Currently, no intraexaminer or interexaminer reliability data are available for the manual tension examination, and the sensitivity and specificity of abnormal findings are unknown. In a radiographic study of laryngeal position in people with MTD, Lowell and colleagues found no difference between control participants and people with MTD in hyoid or thyroid cartilage location at rest. During phonation, control participants lowered the hyoid more than those with MTD, and participants with MTD raised the thyroid cartilage more than controls.

Instrumental Assessment

Instrumental source, aerodynamic, and acoustic assessments are used to document the voice disorder and the status of the voice-production system, including the respiratory system, voice source, and the supraglottal vocal tract. Vocal function studies are sometimes used to define treatment goals or as visual feedback during voice therapy, and repeated testing over time allows clinicians to monitor and document changes that result from treatment or disease progression. Instrumental assessments are also used to improve our understanding of how particular aspects of physiology and production generate the acoustics that evoke voice quality perception in the listener.

It is tempting to consider instrument-based tests “objective,” because they generate numbers, but it is important to remember that examiners influence results and that factors other than laryngeal status can affect measurements. To maximize its usefulness, vocal function testing must be performed using standard protocols, recording procedures, patient instructions, and test environments. A tutorial was recently published recommending protocols and technical specifications for instrumental voice assessment. If adopted by researchers and clinicians, the protocols might improve our ability to compare patient outcomes and study results across voice centers. Note that no single measure explains voice production or portrays the differences among voices, and most speech-language pathologists select a subset of the measures described in the following sections based on their philosophy and education. The measures described here are grouped as vocal respiratory measures, source measures, aerodynamic measures, measures of velopharyngeal function , and acoustic voice measures.

Respiratory Measures

Most people with voice-related concerns do not require full pulmonary function testing. Several measures of respiratory capacity or use during speaking are useful in describing voice disorders and planning treatment. These include maximum inspiratory and expiratory pressures, vital capacity, and the percent vital capacity at which patients initiate and terminate speech.

Source Measures

The voice “source” is the output of vocal fold vibration (glottal area) and its interaction with the pressures of the subglottal and supraglottal vocal tract (glottal flow). Two techniques are used to estimate the source: electroglottography (EGG) and inverse filtering.

Electroglottography

EGG measures the conductance of a low-frequency electric signal across the neck between two surface electrodes. The conductance of the signal varies with the vibration of the vocal folds: when the vocal folds contact each other, conductance increases, and the slope of the resultant EGG trace is positive; as vocal folds separate, conductance decreases, and slope is negative. The results are relative rather than absolute and do not measure glottal area or closure. The waveform's shape is potentially meaningful for describing the pattern of vocal fold vibration, and many quotients to quantify the waveform have been proposed (e.g., open, skewing, and contact). Techniques for quantifying the waveform have not yet been standardized, largely because of technical challenges and difficulty relating the EGG waveform to vocal fold motion.

Inverse Filtered Flow

Inverse filtering is a signal-processing technique that removes the effects of the vocal tract (formants) from the acoustic or the aerodynamic waveform, leaving the glottal flow (i.e., source signal). Several measures can be made from the inverse-filtered flow waveform; these include the skewing quotient, the ratio of increasing to decreasing flow, and the open quotient, the ratio of the increasing plus decreasing flow to the period of the waveform. Measures of the slope of the flow spectrum are thought to be important to overall voice quality. Unfortunately, inverse filtering is technically challenging, and the results are difficult to validate.

Because glottal flow varies with different vocal fold vibration patterns, and because it is important to voice quality, approaches to estimating the glottal flow waveform continue to evolve. Alku and colleagues analyzed sources of error in inverse filtering and proposed and tested a new algorithm. Kreiman and colleagues took a different approach and used a custom synthesizer to spectrally modify the inverse-filtered voice to perceptually match the original production.

Aerodynamic Measures

Aerodynamic evaluation involves measuring air pressures and airflows. Although it is difficult to separate respiratory from laryngeal contributions to measures of air pressure and airflow during voice production, aerodynamic measures provide valuable information in understanding voice production. Measurements are listed in Table 55.1 along with their hypothetical perceptual correlates and normative values.

TABLE 55.1

Perceptual Correlates of Aerodynamic Measures

Measure	Perceptual Correlate	Normative Mean (Standard Deviation) for Women	Normative Mean (Standard Deviation) for Men
Subglottal or intraoral air pressure	Phonatory effort and strength of pressure consonants	7.52 (2.17) cm H ₂ O ^a	6.43 (1.07) cm H ₂ O ^a
Phonation threshold pressure	Effort to initiate phonation	≈3 cm H ₂ O modal, ≈8 cm H ₂ O high pitch ^b
Airflow	Breathiness	91–156 (16–71) mL/s ^c	101–183 (16–77) mL/s ^c
Laryngeal airway resistance	Phonatory effort, vocal strength, and strain	27–51 cm H ₂ O/L/s	24–45 cm H ₂ O/L/s ^c

a Data from Subtelny JD, Worth JH, Sakuda M: Intraoral pressure and rate of flow during speech. J Speech Hear Res 9:498, 1966.

b Data from Verdolini-Marston K, Titze IR, Druker DG. Changes in phonation threshold pressure with induced conditions of hydration. J Voice 8:30, 1994.

c Data from Baken RJ, Orlikoff RF. Clinical Measurement of Voice and Speech , 2nd ed., San Diego, 2000, Singular Publishing Group.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here