Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
By the end of this chapter the reader should:
Know about the different ways in which data can be categorized and displayed
Understand frequency distributions and features of a normal distribution
Know how to describe different types of data
Know what confidence intervals and p-values are and how they can be used
Understand about the application of appropriate statistical tests
Understand how to interpret statistical results in clinical and epidemiological studies
Understand the limits of statistical tests
Statistics are ‘a body of methods for making wise decisions in the face of uncertainty’ (W Wallis: A New Approach, 1957).
As doctors, it is essential for us to have an understanding of statistical principles and methods so that we can:
Conduct research
Interpret data
Appraise evidence
Apply and explain results to patients and families.
Other sections in this book describe research, evidence-based medicine and epidemiology. This chapter will cover the fundamentals of statistics that will provide the tools to navigate the world of clinical and epidemiological research and appreciate its scope and limitations.
While all tests are carried out using software packages, readers of journals and researchers need to know which test to implement and understand what the programme is doing and what sort of output to expect.
Statistical methods can be applied to quantitative data, a set of numbers and values that have been measured. The type and/or method of recording of quantitative data is important since it influences the choice of statistical tests as well as the way in which the data is described and displayed. Qualitative data, by comparison, is descriptive and usually represents an expression of thoughts, feelings or experiences. There are resources available which detail appropriate methodologies for analysing qualitative data, but this will not be covered in this chapter.
Quantitative data (also referred to as variables , i.e. a characteristic, number or quantity that differs between individuals or items) can be numeric , in which a number is recorded, or categorical ( Fig. 38.1 ).
Numeric data, in which a number is recorded, can be further subdivided into discrete or continuous datasets:
Discrete data can only be expressed in whole numbers; for example, number of children per family or number of episodes of severe asthma per year.
Continuous data , on the other hand, can take any value in a given range. For example, height, weight or age.
Categorical data can be:
Binary data – in which there are only two categories; for example, alive/dead or a yes/no response.
Ordinal data – which is in groups that can be ordered; for example, social class 1–5 or grades of bowel cancer.
Nominal data – constitutes a number of groups with no order/hierarchy; for example, blood group or marital status.
The best method for displaying data depends on the type of data and the number of variables and datapoints. A good pictorial presentation of data can be an extremely effective and efficient means of communication. It is also crucial to plot the data:
In order to ensure that there are no obvious errors, e.g. gross outliers, which may have been due to erroneous data collection or inputting mistakes
To understand the shape, scope and overall nature of the data
To identify any interesting patterns.
Following is a list of methods of displaying data:
Bar chart
Box-and-whisker plot
Dot diagram
Histogram
Line diagram
Pie chart showing percentages
Pie chart showing actual numeric values
Scatterplot
For each of the following case scenarios, select the most appropriate graphical depiction method from the list above.
A sample of 1000 seven-year-old male schoolchildren undergo BMI testing.
Smoking in pregnancy. Results of a survey of mothers: 1 to 3 cigarettes/day = 31; >3/day = 44; do not smoke = 856; unspecified = 44.
Analysis of mode of delivery in a group of mothers: 596 normal vaginal deliveries, 318 by caesarean section and 35 by assisted vaginal delivery.
D. Histogram.
A. Bar chart as not a continuous variable.
F. Pie chart showing percentages
See below for details.
Tables are a useful way to summarize and present data and can usually provide more precise numerical data than a graph.
Pie charts are used to demonstrate proportions of a group falling into different categories. A circle is divided into segments, and the angles are proportional to the size of each category ( Fig. 38.2 ).
Bar charts ( Fig. 38.3 ) can be used to display a single variable, with the heights of the bars proportional to the frequency. They may also show the relationship between two variables by being grouped or stacked.
Dot diagrams ( Fig. 38.4 ) can be used to display continuous numeric data for a variable, for a single group or multiple groups. Each dot represents a single value. It is a simple method of conveying as much information as possible, and it is easy to see outliers and to compare the distribution of results in different groups, but it may not be practical where there are large numbers of measurements.
When measurements are repeated at different time points, for example, before and after a certain treatment, lines drawn between paired dots ( Fig. 38.5 ) can illustrate measurements or the effect of intervention/treatment.
Scatterplots ( Fig. 38.6 ) illustrate the relationship between two continuous variables, represented on vertical and horizontal axes. Scatterplots may include a line of best fit (see Correlation and regression , below).
Typically, the line in the middle of the box represents the median value, the upper and lower horizontal lines of the box represent the upper and lower quartiles and each contain 25% of the values, so the box encompasses 50% of the values. The limits of the whiskers represent the highest and lowest values (i.e. the range) and each whisker encompasses 25% of the values ( Fig. 38.7 ).
A sample of 1000 seven-year-old male schoolchildren undergo BMI testing. You are asked to summarize the data numerically, using up to three parameters, without actually showing a graph. Which set of parameters would best describe the data? Select ONE answer only.
Mean, median and confidence intervals.
Mean, median and range.
Mean, standard deviation and confidence intervals.
Median, range and standard deviation.
Variance, standard deviation and range.
A. Mean, median and confidence intervals.
See below for discussion.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here