Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Most of the researchers of medical sciences feel that mathematics in general and statistics in particular are an excessive and difficult task. In biomedical research, knowledge of statistics is mandatory and is an integral part of the research. In the era of evidence-based medicine, the practice of designing and conducting biomedical observations and experiments, presenting the data accruing therefrom, and interpreting the results, would be impossible without applying statistics.
The word “statistics” is devised from the Greek words “status” meaning “state” or “position.” The Oxford dictionary defines statistics as “the study of the collection, analysis, interpretation, presentation, and organization of data.” Statistics can be defined both in plural and singular sense. In plural sense, it means “facts, expressed numerically or in figure, collected in a systematic way with a definite purpose in any field of study,” for example, according to the 2011 census, the population of India is 1.21 billion. In singular sense, it deals with the scientific treatment of data derived from individual subjects. The word statistics is used as the plural of the word “statistic,” which means a quantitative value like mean, median, standard deviation (SD), etc., derived from sample of subjects. For example, we select 15 individuals from a class of 100 patients, measure their body mass index (BMI), and find the average BMI. This average, a single numerical value, would be a statistic.
The word “biostatistics” is combination of two words and two fields of study. The bio part contains biology and the study of living things, and the statistic part contains collection, analysis, and application of data. The use of statistical methods in analyzing data derived from medicine, biology, and public health is termed as “biostatistics.” The other popular names of this branch are biometry, medical statistics, and health statistics, and one can differentiate between them in the following manners:
Biometry : The analysis of biological data using statistical and mathematical procedures.
Medical statistics : Statistics/statistical methods related to clinical and laboratory parameters, their relationship, prediction after diagnosis/treatment, clinical trials, diagnostic analysis, etc.
Health statistics : Statistics/statistical methods related to the health of people in the community; epidemiology of disease, the association of demographic and socioeconomic variables, behavioral variables, environmental and nutritional factors with the occurrence of various disease, measurements of health indicators for a community, etc.
We researchers use statistics the way a drunkard uses a lamp post, more for support than illumination. Winifred Castle, a British statistician.
Biostatistics and biostatisticians are important partners and collaborators in health sciences. They have to be consulted right from the formulation of research question till submitting manuscript for publication.
Apart from collecting data scientifically and summarizing the collected data, biostatistics is used to test the hypotheses of research questions derived from all observational and experimental studies. Scientists or researchers combine biostatistics and probability theory for a given set of data to determine the likelihood of a disease to hit the target population. Therefore, statistical methods are as good at predicting the future as they are in analyzing the past.
For the understanding of biostatistics, it will be worthwhile to be familiar with a few basic statistical terms.
A population is a group of people for which we would like to investigate or make inferences, but it is simply not possible to study everyone with a specific medical condition of interest. The only choice is to select a sample, i.e., to study a subset of people selecting them at random from the population; for example, if we wish to investigate maternal weight gain in pregnancy and baby’s birth weight, we must study a sample of pregnant women. If the selected sample is random and large enough, we can make an inference about population without any snag.
The main purpose of most studies is to collect data to address a particular research question and infer about a target population. The possible types of data in any study are constant and variable .
A constant is a number that never changes with any situation. For example, the value of pie is 22/7 and the value of “ e ,” the base of natural logarithm is 2.7183. These values do not change with time, place, person, any situation, or any factor.
In contrast, variables are properties or characteristics of study subjects that vary in quality or magnitude from person to person. To be a variable, a variable must vary (e.g., not be a constant), that is, it must take on different values, levels, intensities, or states, for example, age, sex, height, weight, blood pressure, cholesterol level, severity of injury, etc. The types of variables are given in Fig. 63.1 .
An understanding of variables is important to summarize them and to choose appropriate statistical methods to analyze them. Generally, there are two types of variables: categorical or qualitative and quantitative or measurable .
Categorical or qualitative variable : If the individual belongs to a particular group, class, or category, it is called a categorical variable, for example, sex (male, female), severity of disease (no disease, mild, moderate, and severe), etc. There are two types of categorical variables: ordinal and nominal . For ordinal variables the categories or groups are ordered in some way, for example, socioeconomic class (rich, middle, poor), degree of pain (no pain, mild, moderate, severe), etc. Nominal variables are those in which there is no possibility of ordering in classification, for example, sex (male and female), blood group (A, B, AB, and O), mortality (no, yes), etc.
A categorical variable is binary or dichotomous if there are only two categories, for example, yes/no, dead/alive.
Quantitative or measurable : A variable that takes some numerical value is called a quantitative or measurable variable. For example, age, height, weight, etc.
There are two types of quantitative variables: discrete and continuous . Discrete variable has distinct numerical value or certain whole numerical value. The number of patient visits during a particular out patient department or the number of children in a household are examples of discrete variables. Continuous variable has no limitation on the values that a variable can take; such variable can have decimal point values, e.g., weight or height.
Furthermore, there are two types of continuous variables. If there is no true zero point, then it is called “interval scale.” In this scale, the zero or starting point is arbitrary. Temperature is an example of interval type of continuous variable. In the case of “ratio scale,” the variable has a true zero point independent of the unit of measurement, e.g., weight or height.
These variables can be called in the following manner in epidemiological point of view while analyzing the data: outcome , exposure / risk factor , other factors (confounder, effect modifier, and intermediate variable). Outcome variable : a variable in which investigator is actually interested. It is also known as dependent, effect, or response variable. Exposure variable : a variable that is manipulated either by the researcher or by nature or circumstance. These variables are also called as stimulus, independent, covariate, factor, or predictor variables. Other factor ( s ): any factor(s) or variable(s) that has potential to influence the relationship between an outcome and exposure.
The value of a parameter is the function of population values, and it is related to the population. Statistic is the function of sample values, and it is related to the sample. For example, if the mean diastolic blood pressure (DBP) of the male population is 80 mmHg, it is the parametric value ( μ ), and if the DBP of a sample of males selected randomly from the population is 84 mmHg, it is the statistic value (
). We use statistic value to estimate the unknown population parameter. As sample size increase, the statistic value—obtained from the sample values—will be as close as possible to the unknown population parameter value.
Ratio is obtained simply by dividing one value by another. In ratio, numerator is not a part of denominator. Examples are male/female or sex ratio, student/teacher ratio, and patient/doctor ratio.
Proportion is a type of ratio in which the numerator is a part of denominator, i.e., numerator is included in the denominator. For example, if there are 400 males and 600 females, then the proportion of males in the population is 40%. This is usually expressed in terms of percentage or in multiple of 10s such as 1000, 10,000 depending upon the number in the numerator with respect to the denominator.
Rate is a measure of the frequency with which an event occurs in a defined population in a defined time. In rate, a measure of time factor is an essential part of the denominator, whereas in a proportion it is does not, for example, number of deaths per 100,000 Asians in 1 year, number of perinatal deaths per 1000 births, etc.
Statistical analysis methods are of two types: descriptive method and inferential method, given in Fig. 63.2 .
Descriptive statistical methods are used to summarize the collected data using tables, diagrams, graphs, and certain statistics or summary measures such as averages (like mean, median etc.), and variation (SD, interquartile range, etc.).
Inferential statistical methods are used to make inferences about the population from which the sample was drawn. In this branch, the unknown population parameter(s) is (are) estimated using sample statistics (called estimates). Inferential statistics can further be divided into two subsections: estimation and hypothesis testing .
The main objective of the statistics, in this way or that way, is to study the population. Population in statistics is defined as aggregate of the objects having certain characteristics. The exact value of any characteristics of any population can be known when each and every member of the population is measured. However, as the population is very large, it is practically impossible to make measure on each and every member of the population. So, we draw a random sample from the given population, and as samples are comparatively very small in size, we can make measurements on each and every member of the sample. On the basis of these measurements, we estimate the value of the population characteristics. This is the main objective of the inferential statistics method. There are two types of inferential methods: estimation ( point estimate , interval estimate ) and hypothesis testing .
It is the process of providing a numerical value for an unknown population parameter on the basis of information collected from a sample. Any statistics, for example, values of mean, proportion, correlation coefficient, computed from the sample for estimating the unknown population parameters is considered as point estimate, as this is a single (point) value/figure and no confidence of any kind can be associated with this value. The larger the sample size, nearer will be this estimated value to the unknown population parameter. On the other hand, interval estimate or confidence interval give us an interval (a lower limit, an upper limit) in which we believe the true parameter value lies, together with an associated probability. The objectives of interval estimation are to find narrow intervals with high reliability. Generally, the confidence probability is fixed as 0.95, 0.99, or 0.999 depending upon the requirement.
The lower limit and upper limit estimates for the statistic are given as
where, C = confidence coefficient, SE = {SD/√ n )}, and n = sample size.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here