Learning objectives

By the end of this chapter the reader should:

  • Know about the different ways in which data can be categorized and displayed

  • Understand frequency distributions and features of a normal distribution

  • Know how to describe different types of data

  • Know what confidence intervals and p-values are and how they can be used

  • Understand about the application of appropriate statistical tests

  • Understand how to interpret statistical results in clinical and epidemiological studies

  • Understand the limits of statistical tests

Introduction

Statistics are ‘a body of methods for making wise decisions in the face of uncertainty’ (W Wallis: A New Approach, 1957).

As doctors, it is essential for us to have an understanding of statistical principles and methods so that we can:

  • Conduct research

  • Interpret data

  • Appraise evidence

  • Apply and explain results to patients and families.

Other sections in this book describe research, evidence-based medicine and epidemiology. This chapter will cover the fundamentals of statistics that will provide the tools to navigate the world of clinical and epidemiological research and appreciate its scope and limitations.

While all tests are carried out using software packages, readers of journals and researchers need to know which test to implement and understand what the programme is doing and what sort of output to expect.

Types of data

Statistical methods can be applied to quantitative data, a set of numbers and values that have been measured. The type and/or method of recording of quantitative data is important since it influences the choice of statistical tests as well as the way in which the data is described and displayed. Qualitative data, by comparison, is descriptive and usually represents an expression of thoughts, feelings or experiences. There are resources available which detail appropriate methodologies for analysing qualitative data, but this will not be covered in this chapter.

Quantitative data (also referred to as variables , i.e. a characteristic, number or quantity that differs between individuals or items) can be numeric , in which a number is recorded, or categorical ( Fig. 38.1 ).

Fig. 38.1, Types of quantitative data.

Numeric data, in which a number is recorded, can be further subdivided into discrete or continuous datasets:

  • Discrete data can only be expressed in whole numbers; for example, number of children per family or number of episodes of severe asthma per year.

  • Continuous data , on the other hand, can take any value in a given range. For example, height, weight or age.

Categorical data can be:

  • Binary data – in which there are only two categories; for example, alive/dead or a yes/no response.

  • Ordinal data – which is in groups that can be ordered; for example, social class 1–5 or grades of bowel cancer.

  • Nominal data – constitutes a number of groups with no order/hierarchy; for example, blood group or marital status.

Displaying data

The best method for displaying data depends on the type of data and the number of variables and datapoints. A good pictorial presentation of data can be an extremely effective and efficient means of communication. It is also crucial to plot the data:

  • In order to ensure that there are no obvious errors, e.g. gross outliers, which may have been due to erroneous data collection or inputting mistakes

  • To understand the shape, scope and overall nature of the data

  • To identify any interesting patterns.

Question 38.1

Displaying statistical data

Following is a list of methods of displaying data:

  • A.

    Bar chart

  • B.

    Box-and-whisker plot

  • C.

    Dot diagram

  • D.

    Histogram

  • E.

    Line diagram

  • F.

    Pie chart showing percentages

  • G.

    Pie chart showing actual numeric values

  • H.

    Scatterplot

For each of the following case scenarios, select the most appropriate graphical depiction method from the list above.

  • 1.

    A sample of 1000 seven-year-old male schoolchildren undergo BMI testing.

  • 2.

    Smoking in pregnancy. Results of a survey of mothers: 1 to 3 cigarettes/day = 31; >3/day = 44; do not smoke = 856; unspecified = 44.

  • 3.

    Analysis of mode of delivery in a group of mothers: 596 normal vaginal deliveries, 318 by caesarean section and 35 by assisted vaginal delivery.

Answers 38.1

  • 1.

    D. Histogram.

  • 2.

    A. Bar chart as not a continuous variable.

  • 3.

    F. Pie chart showing percentages

See below for details.

Tables

Tables are a useful way to summarize and present data and can usually provide more precise numerical data than a graph.

Pie charts

Pie charts are used to demonstrate proportions of a group falling into different categories. A circle is divided into segments, and the angles are proportional to the size of each category ( Fig. 38.2 ).

Fig. 38.2, Deaths by cause, percentage of total, and numbers, among 5–9-year-olds in the UK, 2010. This chart type gives a simple visual representation allowing the reader to picture all categories at once and compare their relative proportions.

Bar charts

Bar charts ( Fig. 38.3 ) can be used to display a single variable, with the heights of the bars proportional to the frequency. They may also show the relationship between two variables by being grouped or stacked.

Fig. 38.3, Length of hospital stay for children diagnosed with chylothorax.

Dot diagrams

Dot diagrams ( Fig. 38.4 ) can be used to display continuous numeric data for a variable, for a single group or multiple groups. Each dot represents a single value. It is a simple method of conveying as much information as possible, and it is easy to see outliers and to compare the distribution of results in different groups, but it may not be practical where there are large numbers of measurements.

Fig. 38.4, Asthma deaths over time by age group (n = 193).

Line diagrams

When measurements are repeated at different time points, for example, before and after a certain treatment, lines drawn between paired dots ( Fig. 38.5 ) can illustrate measurements or the effect of intervention/treatment.

Fig. 38.5, Paired axillary–oral temperatures. Same measurement on each patient. First 100 patients aged 4–14 years.

Scatterplots

Scatterplots ( Fig. 38.6 ) illustrate the relationship between two continuous variables, represented on vertical and horizontal axes. Scatterplots may include a line of best fit (see Correlation and regression , below).

Fig. 38.6, Scatterplot of paired axillary–oral temperatures. Same measurement on each patient. Patients aged 4–14 years. 112 children during the course of their admission to hospital.

Box-and-whisker plots

Typically, the line in the middle of the box represents the median value, the upper and lower horizontal lines of the box represent the upper and lower quartiles and each contain 25% of the values, so the box encompasses 50% of the values. The limits of the whiskers represent the highest and lowest values (i.e. the range) and each whisker encompasses 25% of the values ( Fig. 38.7 ).

Fig. 38.7, Diagrammatic explanation of box-and-whisker plot.

Describing data

Question 38.2

Describing data

A sample of 1000 seven-year-old male schoolchildren undergo BMI testing. You are asked to summarize the data numerically, using up to three parameters, without actually showing a graph. Which set of parameters would best describe the data? Select ONE answer only.

  • A.

    Mean, median and confidence intervals.

  • B.

    Mean, median and range.

  • C.

    Mean, standard deviation and confidence intervals.

  • D.

    Median, range and standard deviation.

  • E.

    Variance, standard deviation and range.

Answer 38.2

A. Mean, median and confidence intervals.

See below for discussion.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here