Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Outcome measures are the tools used to determine the efficacy, safety, and side effects of a treatment. Researchers assess outcome measures before and after treatments to determine their relative efficacy. Clinicians can use outcome measures to track the success of their treatments and/or longitudinally to follow the outcomes of individual patients. Urinary incontinence, fecal incontinence, pelvic organ prolapse (POP), and other pelvic floor disorders are multidimensional phenomena that can impact a patient in a wide variety of ways. They rarely result in severe morbidity or mortality; rather, they cause symptoms that can impact a woman’s daily activities and negatively affect her quality of life (QOL). No single measure can fully characterize the outcome of an intervention for these conditions. Therefore, outcomes of treatment should be evaluated in multiple areas or domains. A number of organizations, including the International Continence Society (ICS), International Urogynecologic Association (IUGA), National Institutes of Health (NIH), and World Health Organization (WHO) -sponsored International Consultation on Incontinence (ICI), have made recommendations to standardize outcome measures in studies of pelvic floor disorders. In general, all agree on several basic principles: (1) outcome assessments should be made using the same measures before and after the intervention; (2) both subjective and objective measures should be included, incorporating improvements and deterioration in function, as well as complications of the intervention; and (3) pelvic floor disorders should be assessed from multiple domains, including some or all of the following:
The subject’s observations (symptoms)
Quantification of symptoms
The clinician’s observations (anatomic and functional)
QOL
Socioeconomic measures
In particular, patient-reported outcomes (PROs) have been increasingly incorporated into the practice and study of pelvic floor disorders. Because of the nature of these disorders, inclusion of the patient’s perspective is essential. PROs measure patient perceptions at four levels of increasing complexity: symptoms, functioning, general health perceptions, and health-related QOL (HRQOL).
A good outcome measure should be valid, reliable, simple to implement, easy to interpret, and able to detect clinically meaningful change. When planning a study, outcome measures should be selected within the context of the specific study’s hypothesis or goal. In general, they should be chosen so that they will be clinically relevant and so that the results may be incorporated into practice at the end of the study. For a clinical trial, researchers will typically define a primary outcome and several secondary outcomes. The primary outcome is of central interest and should be tied directly to the study’s primary hypothesis. It is the primary outcome that is used for sample size determination. Secondary outcomes are the remaining outcome measures being assessed in the study. They are not the focus of the main study objective but provide additional data that are complementary to the primary outcome measure. Because of the desire to assess pelvic floor disorders in multiple domains, typically several secondary outcome measures are used in studies of these conditions.
In this chapter, many of the currently available outcome measures available to clinicians and researchers to assess treatment outcomes for pelvic floor disorders will be reviewed, including symptom diaries, pad tests, physical examination, physiologic tests (such as urodynamics, symptom severity, or bother questionnaires), QOL questionnaires, and socioeconomic measures. The chapter will conclude by discussing the current challenges and recommendations for defining treatment success in patients with urinary incontinence and POP.
Physicians often attempt to determine the presence and severity of a patient’s symptoms by history-taking or administering a questionnaire. These methods depend upon the patient’s ability to accurately recall and report her recent health experiences. Research has shown, however, that recall is often unreliable and can result in inaccuracies and bias. For instance, in one study, more than half of women overestimated their daytime urinary frequency when compared with a bladder diary. The use of symptom diaries has been advocated to limit recall bias by capturing experiences prospectively close to or at the time of occurrence. The bladder or urinary diary is perhaps the most common outcome measure used in studies of urinary incontinence and other forms of lower urinary tract dysfunction. It is also a useful clinical tool (see Chapters 9 and 14 ). In its simplest form, a patient is asked to prospectively record the time and number of voluntary voids and incontinence episodes over a specified period of time, usually 1 to 7 days. In more complex forms, a subject may be asked to record pad usage, type and amount of fluid intake, voided volumes, frequency and severity of urinary urgency, and/or activities that occur in relation to their lower urinary tract symptoms (LUTS). Subjects are also often asked to record the time that they go to bed and the time they awaken to distinguish daytime from nighttime symptoms. For research studies of urinary incontinence, the NIH recommends a 3-day bladder diary that records and reports, at a minimum, pad usage, urinary incontinence episodes, and voiding frequency. Diaries in which patients are asked to report fluid intake and record voided volume using a graduated toilet insert are often called frequency-volume charts. Although more cumbersome for patients, frequency-volume charts provide a significant amount of additional data about lower urinary tract function not available from simpler bladder diaries, including average daily fluid intake, total daily voided volume, mean voided volume, largest single void (functional bladder capacity), and daytime and nighttime voided volumes. Although diaries have been used primarily as an outcome measure for studies of urinary symptoms in the gynecology and urology literature, symptom diaries are used extensively in many areas of clinical research as well. For instance, they have been used frequently in studies of bowel dysfunction to record the frequency of bowel movements, fecal incontinence episodes, and so on. Similarly, pain diaries are a standard outcome measure in studies of acute and chronic pain management.
The accuracy of symptom diaries depends upon the subject’s ability to follow instructions. The circumstances under which a diary is kept should approximate everyday life and should be similar before and after the intervention to allow for meaningful comparison. Reproducibility depends upon the nature of the diary and the parameters being measured. In general, the reproducibility of symptom diaries improves as the duration of self-reporting increases. However, as diary duration increases, patient compliance tends to decrease. The most appropriate duration for bladder diaries has not been established. Although some have advocated the use of a single 24-hour diary, the reliability of diaries using this short duration is poor, limiting its use in research. The most commonly used duration is 7 days. Studies have demonstrated a high reliability for incontinence episodes, urinary frequency, urgency, and nocturic episodes in both men and women with either stress urinary incontinence (SUI) or overactive bladder (OAB) with this diary duration. In women with SUI, a 3-day diary appears to have similar reproducibility compared with a 7-day dairy with regard to number of incontinence episodes and voiding frequency. Similarly, in patients with OAB, the reliability of urgency incontinence episodes, urgency episodes, and daytime and nighttime frequency was adequate with a 3-day diary, but not as good as with a 7-day diary. Although a survey of clinicians and patients suggests that 4 days is the optimal diary duration, found that a 3-day dairy explains at least 94% of the variance of a 4-day diary. As mentioned, the NIH recommends a dairy duration of at least 3 days for the evaluation of LUTS.
The primary strength of symptoms diaries, at least in theory, is that they avoid the biases and inaccuracies of memory recall and record a subject’s symptoms in her normal day-to-day environment. There is evidence, however, that many patients may not actually complete their diary as events occur. In a study of adults with chronic pain by , subjects were asked to complete a pain diary for 21 consecutive days. Each of these “paper and pen” diaries was fitted with an unobtrusive photosensor that detected light and recorded when the diary was opened and closed. Subjects were asked to record their pain at three set time periods each day and were not informed of the presence of the photosensor. At the end of the study, subjects reported a greater than 90% compliance with the diary; however, the photosensor revealed that only 11% of subjects had filled out their diaries at the prescribed times. For most subjects, the records were marked by long periods, from days to weeks, when the diary was not opened, even although entries were made for those days when the diary was turned in at the end of the study, suggesting that subjects frequently backfilled their diary to complete missing days. Such backfilling is particularly subject to retrospective biases. In this study, a parallel group of patients were given computer diaries that prompted them to complete their diary at the specified times, and 94% true compliance was noted in this group, suggesting an advantage of computer diaries over paper ones. However, such computer prompting would not be useful for recording spontaneous events like voiding or incontinence episodes. In spite of this concern about retrospective completion of symptom diaries, they remain an important outcome measure in the study of pelvic floor disorders because of their widespread use, general acceptance, and proven reproducibility, particularly for studying lower urinary tract dysfunction. When using a bladder diary or similar symptom diary in a study, time should be spent instructing the subject in the proper use of the diary and the importance of completing the diary in a prospective manner.
Another strength of symptom diaries is that they provide information on symptom frequency and, in some cases, severity in a way that is quantifiable. This is particularly useful in studies of an intervention in which patients are not commonly “cured,” but may show an improvement in symptoms, such as studies of medical or behavioral therapy for urinary incontinence. In fact, bladder diaries are the most common primary outcome measure used in studies of this type. Furthermore, outcome measures that are continuous variables tend to provide greater statistical power than a dichotomous variable, so using a variable from a bladder diary, such as number of incontinence episodes per week, rather than a dichotomous outcome such as “cure/failure,” will usually allow for a smaller study sample size. An additional strength of bladder diaries in particular is that normal population values for variables like voiding frequency, mean voided volume, and daytime and nighttime urine output have been published, providing useful reference values for defining study populations and estimating treatment goals.
In addition to the possibility that symptom diaries may not always be completed contemporaneously, another potential weakness of this outcome tool is lower patient compliance with completing symptom diaries when compared with simpler measures like questionnaires. In large pharmaceutical trials in which subjects are carefully selected and often financially compensated to participate, compliance with symptom diaries is typically high, often over 90%, but high compliance may be difficult to achieve in other settings. A multicenter NIH-funded cohort study of 1064 participants with urinary symptoms included a 3-day voiding and fluid intake diary at baseline. Some 902 (85%) participants returned the diary, 796 diaries (75%) were considered usable, and only 448 (42%) were considered complete ( ). Younger participants were more likely to have missing or incomplete diaries. In smaller, less well-funded studies, patient compliance with symptom diaries may be even lower. prospectively studied 107 women who underwent pubovaginal sling procedures. They used the Simplified Urinary Incontinence Outcome Score, a composite outcome that combines the results of a questionnaire, a 24-hour pad test, and a 24-hour bladder diary into a single score, as their primary outcome. Although all patients completed the questionnaire postoperatively, only 52% completed the symptom diary and/or the pad test even after repeated telephone contacts, reducing the number of subjects for whom the primary outcome was available to half the original study population. When considering using a symptom diary as an outcome measure, the advantages of this tool must be weighed against the possibility of poor patient compliance.
Another important consideration when using a bladder diary as an outcome measure is the therapeutic effect that dairy completion in itself may have on lower urinary tract function. As described in Chapter 14 , bladder retraining using diaries is an effective intervention for both SUI and urgency urinary incontinence (UUI). Several authors have suggested that the high improvement rates seen in the placebo groups in pharmaceutical trials of OAB and other similar trials (often >40%) are caused, in part, by a “bladder retraining effect” that occurs just by using diaries throughout the trial. Some studies suggest that this effect can occur as early as 4 days after starting diary use. When using a bladder diary to evaluate the effect of an intervention, the only certain way of accounting for the therapeutic effect of the diary itself is to include a control group in the study.
In recent years, there has been increased interest in electronic diaries, particularly those that use a mobile phone app interface. Electronic diaries have several advantages over patient diaries, including the ability to automatically calculate various parameters, the ability to remind or prompt patients to complete the diary input, and the ability to record the exact time of data entry. Also, patients tend to prefer electronic over paper entry. Disadvantages of electronic diaries include the potential cost, although most apps currently available are free or have a nominal cost, as well as concerns that the presence of a device that may include reminders may distort the diary results. One study comparing paper diary to mobile app found that participants recorded more voids and leaks on the paper diary than they entered into the app; the reasons for this difference are unclear ( ). Currently, there are over a dozen bladder diary mobile apps available on iPhone or Android platforms. Unfortunately, although often based on validated paper-based bladder diaries, none of the apps that are publicly available appear to have been specifically validated for this format. validated a Spanish-language 3-day bladder diary mobile app, demonstrating it to be a reliable, valid method of assessing symptoms in patients with OAB and nocturia who have a smart phone. The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Prevention of Lower Urinary Tract Symptoms Consortium developed a mobile app (Where I Go+) to study healthy bladder habits in women and girls that includes geo-location tracking; currently this app is only available to study participants.
Pad testing attempts to objectively quantify the volume of urine loss by weighing a perineal pad before and after a specified time and/or group of activities. It is currently the only incontinence severity measure that captures the actual volume of leakage. Pad testing also has been used to attempt to distinguish continent from incontinent women. Numerous pad test protocols have been described, but in general they can be divided into short-term and long-term tests.
The short-term pad tests each ask subjects to perform a set of standardized provocative maneuvers in the office that, depending upon the protocol, can last from 10 minutes to 2 hours. In an attempt to standardize bladder volumes, most short-term pad tests specify that subjects start the pad test with a symptomatically full bladder, drink a standardized volume of liquid, or have a standard volume of fluid instilled in the bladder before the test. A preweighed pad is then worn while performing a predefined group of activities that typically includes such things as walking, climbing stairs, jumping, bending, coughing, and washing hands over a specified period of time. The volume of urine loss is obtained by weighing the pad at the completion of the test. For short-term tests, a change in pad weight of greater than 1 g is considered positive. When using a short-term pad test as a study outcome, the specific protocol used should be described. In 1983, the ICS recommended the 1-hour pad test described in Box 41.1 in an attempt to standardize this outcome measure across studies. Compared with long-term tests, short-term pad tests are easy and quick, and patient compliance can be directly monitored. Because of this, they are used frequently in clinical trials. However, a significant disadvantage of short-term pad tests is that they lack authenticity. These office tests do not necessarily reproduce the activities or situations that result in urine loss in a patient’s everyday life. In fact, some patients may not be physically capable of completing all of the prescribed activities in the protocol. Another limitation of short-term pad tests is their poor test-retest reliability. Although some studies have demonstrated good correlation between short-term pad tests performed in the same subject on two separate occasions, many have found poor repeatability with this test. demonstrated differences of up to 24 g between two test results in the same subject 1 to 15 days apart using the ICS 1-hour pad test, and concluded that this test is not precise enough to allow reliable quantitation of urinary incontinence. This variation within subjects is largely attributable to differences in bladder volumes at the time of the test, and protocols that standardize pretest bladder volumes tend to have higher reliability.
Test is started without the patient voiding.
Preweighed pad is put on, and the first 1-hour test period begins.
Subject drinks 500 mL of sodium-free liquid within a short period (maximum 15 minutes), then sits or rests.
Half-hour period: subject walks, including stair-climbing equivalent to one flight up and down.
During the remaining period, the subject performs the following activities:
Standing up from sitting, 10 times;
Coughing vigorously, 10 times;
Running on the spot for 1 minute;
Bending to pick up small object from floor, 5 times;
Washing hands in running water for 1 minute.
At the end of the 1-hour test, the pad is removed and weighed.
If the test is regarded as representative, the subject voids, and the volume is recorded.
Otherwise, the test is repeated, preferably without voiding.
Long-term pad tests are performed by giving a patient several preweighed pads to take home and wear for 24 to 48 hours. Patients are encouraged to mimic their regular daily activities and change the pads as they wish during the study period. Subjects should be instructed to place the pads in a sealed plastic bag after use to avoid evaporation. Afterward, pads are mailed to the clinic to be weighed on a precision scale to determine the total urine loss over the specified time period. Studies have shown that, as long as sealed bags are used, evaporation loss is minimal for up to 2 weeks. A bladder diary is often completed concurrently with the pad test to provide a comprehensive lower urinary tract evaluation. Changes in pad weights up to 7 g per 24 hours can be seen in healthy continent women, so values less than this should be considered insignificant. The primary advantage of long-term pad tests is that their results reflect everyday life. Also, the reproducibility of long-term tests is generally higher than that of short-term pad tests. Increasing the duration of the test from 24 hours to 48 to 72 hours increases the reliability further but decreases patient compliance. Of note, in patients with SUI, 24-hour pad test was inferior to cough stress test for diagnosis ( ). Not surprisingly, compliance with long-term pad tests varies considerably from study to study. As with bladder diaries, in large trials in which patients are carefully selected and are compensated for study participation, compliance with long-term pad tests tends to be high, whereas in smaller, less well-funded studies, compliance is often lower.
Outcome measures that assess changes in anatomy have an essential role in studies of pelvic floor disorders, particularly POP. Anatomic outcomes can be assessed by physical examination or radiographic studies. Currently, the presence and magnitude of POP are most often determined by physical examination. Although imaging studies have a role in evaluating women with prolapse, and may, in fact, be more accurate in determining which organs are involved in prolapse, the current lack of standardization, validation, and universal availability precludes their use as the “gold standard” for evaluating pelvic organ support.
Accurately evaluating the anatomic effects of prolapse surgery requires a standardized, reliable, validated, and reproducible system for describing the topographic anatomy of the pelvic floor and vaginal support. The Pelvic Organ Prolapse Quantitation (POPQ) system was introduced in 1996 jointly by the Society of Gynecologic Surgeons, the American Urogynecologic Society, and the ICS as the accepted method for describing pelvic support and comparing examinations over time and after interventions. This system has since been similarly adopted by the NIH, the IUGA, and the ICI. Details of the POPQ system can be found in Chapter 8 . This prolapse grading system has been shown to have good inter- and intraexaminer reproducibility in a number of studies in the United States and Europe, and, although other prolapse grading systems are still used by some, POPQ has become the most commonly used system in the peer-reviewed literature.
In addition to its widespread adoption and proven reproducibility, another advantage of the POPQ system is its relative precision (nine site-specific measurements in 1-cm increments), which has allowed an improved understanding of the relationship between the anatomic characteristics of POP and the development of specific pelvic floor symptoms. Disadvantages include its relative complexity and the exclusion of some anatomic findings that some investigators believe to be essential for complete patient description, such as vaginal caliber, status of paravaginal support, pelvic floor descent, and urethral mobility. When evaluating pelvic organ support in a study, investigators should perform a standardized evaluation including the POPQ before and after the intervention. Details of this evaluation should be reported, including the position in which the examination was performed, the fullness of the bladder, the type of vaginal specula, retractors, and measuring devices used, and the method used to ensure that the maximal extent of prolapse is seen. It is critical that the examiner sees and describes the maximum protrusion noted by the individual during her daily activities. Ideally, the examiner should not be the subject’s surgeon and should be blinded to the treatment assignment to limit observer bias.
Other anatomic outcomes that are often assessed by physical examination include pelvic floor and anal sphincter muscle strength, for which several valid and reliable grading scales exist, and urethral mobility measurements using the cotton swab test, ultrasound, or similar system.
Imaging techniques, including evacuation proctography or defecography, magnetic resonance imaging (static and dynamic), and ultrasonography, can be used to assess anatomic outcomes in studies of women with pelvic floor disorders. These techniques may provide a more accurate picture of the location, support, and integrity of the pelvic visceral structures than does a simple physical examination. The use of dynamic imaging techniques can provide functional, as well as structural, information about an individual subject and may be useful in providing insights into the pathogenesis of many pelvic floor disorders, including urinary incontinence, fecal incontinence, defecatory dysfunction, and POP. Details of the various imaging studies that can be used to assess anatomic outcomes in the investigation of pelvic floor disorders, including their relative advantages and disadvantages, can be found in the previous chapters of this text. All research using imaging studies as an outcome should report: (1) the position of the patient; (2) specific verbal instructions given to the patient; (3) bladder and bowel content, including prestudy preparations; and (4) specific details of the imaging technique and equipment. See Chapter 13 for a detailed description of radiologic studies in the evaluation and outcome assessment of pelvic floor disorders.
Physiologic testing attempts to describe or quantify the underlying function of the pelvic viscera and pelvic floor, often including an assessment of whether such function is normal or pathologic. Physiologic testing serves two principal roles in pelvic floor research: to describe subjects at study entry, and to help define or understand treatment outcome. The most common physiologic test of the lower urinary tract is urodynamics. Physiologic tests of the lower gastrointestinal tract include anal manometry and colon transit studies. Other physiologic tests of the pelvic floor include such things as vaginal pressure transducers to measure levator ani contraction strength and electromyography of the pelvic floor muscles and urinary and anal sphincters to evaluate neuromuscular function. In addition to defining subjects at baseline and evaluating treatment outcomes, the use of physiologic testing in pelvic floor research can provide valuable insight into the underlying pathophysiology of the condition and into the mechanisms of treatment success or failure. Additionally, physiologic testing allows correlation between changes in symptoms and changes in physiology. The primary disadvantage of using physiologic tests in research is that they are often costly and time-consuming, and many can be uncomfortable for the patient.
Historically, urodynamic testing was considered the “gold standard” treatment outcome for studies evaluating the treatment of urinary incontinence because it is one of the only means of objectively assessing lower urinary tract function or dysfunction in incontinent patients. Unfortunately, few physiologic tests of pelvic floor function, including urodynamics, have undergone a rigorous evaluation of their reproducibility or validity. In 2009, a panel of international experts convened to evaluate the ability of urodynamics to improve or predict outcome of incontinence treatment concluded that, other than in children and in patients with neurogenic bladder, the evidence supporting each component of urodynamic testing and anal manometry was generally weak (grade C or D recommendations), and many components of the tests were considered investigational. , in a review for the International Foundation of Functional Gastrointestinal Diseases Consensus Conference on Advancing Treatment of Fecal and Urinary Incontinence, concluded that more clinical evidence is needed to establish the reproducibility of many components of urodynamic testing before the tests can be used as primary outcome measures in incontinence treatment studies. Recent clinical trial data suggest that multichannel urodynamics does not provide clinical benefit over basic office testing in patients with uncomplicated SUI, and that urodynamic parameters correlate poorly with outcomes. Thus, although physiologic outcome measures provide unique insight into pelvic floor function and are essential in explanatory trials, they should not be considered the gold standard by which patient-oriented outcomes are judged but should instead constitute one element of a range of patient responses to therapy. When using physiologic testing in pelvic floor research, investigators should use generally accepted standardized terminology and methods, such as those recommended by the ICS and/or NIH, whenever possible.
Pelvic floor symptoms can be assessed in a number of ways. Obviously, taking a thorough clinical history is an important method of assessing a patient’s symptoms and their effect on daily life. However, in a situation in which a standardized, reproducible assessment is desired, clinical histories can be problematic, because they typically take on a different form for each clinician and patient encounter. The most valid way of measuring the presence, severity, and impact of a symptom or condition on a patient’s activities and well-being is using psychometrically robust self-administered questionnaires. An increasing number of questionnaires and other PROs for women with pelvic floor disorders is now available. Most are intended to evaluate LUTS, but, over the last two decades, questionnaires have been developed for women with fecal incontinence and POP. In general, the two most commonly used and clinically relevant categories of questionnaires are: (1) those that measure the presence of particular symptoms and their severity (symptom questionnaires); (2) those that measure HRQOL. Other types of questionnaires are available for women with pelvic floor disorders, including those that measure sexual function, pain, physical functioning, and surgical complications.
PRO development is a complex process that is governed by the principles of psychometrics. Psychometrics is the science of the measurement of responses to phenomena that are not easily quantifiable. Ideally, the items on a questionnaire or PRO are based upon a conceptual framework that is constructed with direct input from patients and experts, usually using cognitive interviews, focus groups, and/or similar qualitative methods. For a questionnaire to be useful in research or in practice, it must demonstrate three important psychometric properties: validity, reliability, and responsiveness. Put in the simplest terms, the validity of a questionnaire is whether it measures what is intended. The reliability of a questionnaire refers to its ability to measure in a reproducible fashion. Responsiveness refers to a questionnaire’s ability to reliably detect the overall effect of treatment and clinically meaningful change. When studies have been performed to demonstrate that a particular questionnaire has good psychometric properties, that questionnaire is said to “validated.” Some important aspects of validity, reliability, and responsiveness that should be assessed when evaluating the psychometric properties of a PRO are listed in Table 41.1 . Other characteristics that are desirable in a questionnaire include being easy to understand and being feasible to implement.
Psychometric Property | Description |
---|---|
Validity | |
Face validity | Subjective assessment by an expert panel and/or patient focus group as to whether the instrument appears to measure what it intends to measure. |
Content validity | Subjective assessment by an expert panel and/or patient focus group as to the extent that the domain of interest is comprehensively sampled by the questions in the instrument. |
Construct validity | An assessment as to whether the instrument has appropriate relationships with other variables or measures. That is, the instrument correlates or agrees with other tests or measures of the same construct (convergent validity) and has little or no correlation or agreement with measures of different constructs (discriminant or divergent validity). |
Criterion validity | Extent to which an instrument correlates with an established criterion standard (or “gold standard”). For HRQOL questionnaires, no gold standard exists; thus, criterion validity cannot be assessed. Criterion validity is assessed for diagnostic and prognostic tests and other measures where an established criterion standard exists. |
Reliability | |
Internal consistency | The extent to which the items on a scale are related to one another; often assessed with the Cronbach’s alpha statistic (values of >0.70 demonstrate adequate internal consistency). |
Test-retest reliability | An assessment of the repeatability; the correlation between instrument scores on two separate occasions. Repeat measurements should be made far enough apart in time so earlier responses are forgotten, yet not so far apart that the construct being measured might have changed. |
Responsiveness | |
Assessment as to whether the instrument can detect clinically meaningful change. Methods for assessing responsiveness can broadly be separated into groups: distribution-based methods that measure the relative amount of change from baseline of an instrument after treatment (i.e., paired t-test, effect size) and anchor-based methods that compare the change in an instrument to some other measure that has clinical relevance (example: a 5-point improvement in IQOL score is associated with a 25% or greater decrease in number of incontinence episodes on bladder diary). |
Most PROs used for pelvic floor disorders have been developed using the principles of classic test theory. In recent years, there has been increased emphasis on developing PROs using more state-of-the-art methods, specifically Item Response Theory (IRT). This effort has been spearheaded by the NIH Patient-Reported Outcomes Measurement Information System (PROMIS) initiative ( http://www.nihpromis.org ). The goal of PROMIS is to build and validate item banks of self-reported measures that assess key health concepts for adults for functions, symptoms, behaviors, and feelings using IRT psychometrics. Item banks created from IRT are content-valid, cover all aspects of the construct being measured, and have enough items to attain high measurement precision. Once items are calibrated using IRT, they can either be administered as a static short form, or the calibrations can be used to guide computerized adaptive testing (CAT). In CAT, a seed item is delivered to the patient and based upon her response, the most relevant items from the bank are selected for further administration. Two individuals taking a CAT may receive different items, but because items are all calibrated along a common dimension, the scores are comparable, without patients needing to answer nonrelevant questions ( ). The principal advantage of a short form or CAT is decreasing question burden (typically by ∼50%) while maintaining precision. CAT administration requires the patient/subject to have access to a computer, tablet, or mobile phone app, however. developed and validated a CAT for women with urinary incontinence based on the PROMIS framework. Further details of PROMIS PROs and other condition-specific PROs for women with pelvic floor disorders that have used IRT methodology are detailed later.
When choosing a questionnaire for use in clinical practice or in research, the first step is to determine if the questionnaire actually measures what you desire. A brief review of the questionnaire’s content and structure will provide important information in this regard. It is important to keep in mind the purpose for which a questionnaire was originally designed and the population in which it was validated. Before questionnaires are used in populations or contexts other than those they were designed for, further validation is usually necessary. The second step is to assess the reliability, validity, and responsiveness of the questionnaire. Use of nonvalidated questionnaires may provide misleading information or fail to detect important clinical changes. Whenever possible, it is desirable to use a validated questionnaire that is widely accepted and has been used many times in the population you wish to evaluate. The final step is to determine whether the length and construct of the questionnaire, as well as mode of administration (e.g., paper, phone, computer, mobile app, etc.), are such that it is feasible to administer in your practice or research study. Long questionnaires may be desirable for research studies in which lots of detail is desirable but are likely to be too burdensome and time-consuming to be used effectively in clinical practice. In general, in PROs developed using classic test theory, using only part of a validated questionnaire or changing the order or content of a questionnaire is discouraged because this can change its psychometric properties. PROs developed using IRT offer more flexibility in terms of question order and ease of developing short forms or even CAT, as previously discussed.
Symptom questionnaires are used to assess the presence, severity, and impact of particular symptoms or groups of symptoms. A list of valid and reliable symptom questionnaires developed for women with various pelvic floor disorders is shown in Box 41.2 . One of the most widely used symptom questionnaires in the study of pelvic floor disorders is the Urogenital Distress Inventory (UDI). The UDI contains 19 questions about LUTS separated into three scales: irritative symptoms, obstructive/discomfort symptoms, and stress symptoms. Respondents are asked if they have a particular symptom and, if they do, to assess the degree it bothers them on a four-point scale from “not at all” to “greatly.” The shortened version of the UDI is the UDI-6, a six-question instrument that correlates well with the longer version and has been widely used.
Incontinence Severity Index
International Consultation on Incontinence Questionnaire short form (ICIQ-SF)
Urogenital Distress Inventory (UDI)
Urogenital Distress Inventory short form (UDI-6)
Lower Urinary Tract Dysfunction Research Network Symptom Index-29 (LURN SI-29)
Lower Urinary Tract Dysfunction Research Network Symptom Index-10 (LURN SI-10)
King’s Health Questionnaire
Bristol Female Lower Urinary Tract Symptom Questionnaire (BFLUTS)
St. Mark’s (Vaizey) score
Fecal Incontinence Severity Index (FISI)
Accidental Bowel Leakage Evaluation (ABLE)
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here