To understand quantitative genetics its essential to know some statistics.
The best approach is to take a good course in statistics. But for the time
being, here's a brief introduction. Our objective will be to make some
descriptions about measurements made on a particular quantitative trait
for a sample of individuals from a population. The three main concepts
presented here will be (a) central tendency, (b) variability (or dispersion),
and (c) the relationship between two traits (or variables).
Central tendency
Consider collecting data on a randomly selected sample of individuals
from a specific population. Think of taking measurements on two quantitative
traits x and y. Constructing a histogram for one of the traits
would give us some indication of the distribution of this trait in the
population. Remember our histogram of height for the students in the class.
The first descriptive statistic we would want to compute would be
the average, or the sample mean, which reflects the central tendency
of the distribution. For x the mean of the sample is simply:

A remarkable property of the sample mean, which makes it such a valuable
statistical tool, follows from the Central Limit Theorem, which
states that whenever n is moderately large, the sample mean has approximately
a normal distribution, regardless of the distribution of the underlying
variable x. This is worth proving to yourself by generating a collection
of numbers with any distribution and then repeatedly take fairly large
samples from that collection and calculate the mean. The construct a histogram
of the means you calculated. What might this tell us about quantitative
traits remembering that we generally consider these treat to be controlled
by many loci with additive effects?
Variability
The two measures of variability most often are considered are the sample
variance and the standard deviation. Both measure the width
of the distribution for traits of interest. First, here's the sample variance
for x:

This describes the variability in terms of an "average" of squared deviations
from the sample mean (we use n-1 for the sample variance because
it provides a good estimator of the variability in the population). However,
the units for the sample variance are squared in comparison to our original
variable, therefore an alternative measure of variability is the standard
deviation, which adjusts for this. The sample standard deviation is the
square-root of the variance:
![]()

Relationship between variables
We will consider two general ideas that consider the relationship between variables which are regression and correlation. There are some very important distinctions between the application of these two analyses that we won't go in to here (but would be an important part of a statistics course dealing with these topics).
The simplest idea to think about for the relationship between two variables
is the correlation. For example what is the correlation between height
and weight in humans. There is certainly a positive relationship, but to
quantify this relationship we calculate the sample correlation coefficientr.
For the two variables x and y the sample correlation coefficient
is:

which is the covariance between x and y (in the numerator) divided a measure of the variation in x and variation in y. The correlation coefficient ranges in value from -1 to +1, with zero indicating no relationship. The covariance is another population parameter that describes the average amount that two variables "covary."
A regression also can be used to describe the relationship between x
and y (again there are some technical differences). In the context
of parent-offspring regressions we are interested in finding the best-fit
regression line for a plot of x and y. The equation for a
regression line is similar to the familiar y = mx + b. It is:
![]()
where ß0 is the intercept of the line and ß1
is the slope of the line. To find the best-fit regression line a commonly
used method is least-squares, which determines the line that minimizes
the sum of the squared deviations from the line (in the direction of the
vertical distances from the line). With this method the best-fitting straight
line is determined by the formulas:
and
![]()
The "hats" on these parameters indicate they are estimates. Note the
similarity between the estimate for the slope of the line and the correlation
coefficient.
[ Back to Lecture 11 | Lecture
Outlines | Bio 120 Home Page |
Department of Biology | Duke
University ]