This article needs attention from an expert in Mathematics. (October 2019) |

**Squared deviations from the mean (SDM)** are involved in various calculations. In probability theory and statistics, the definition of *variance* is either the expected value of the SDM (when considering a theoretical distribution) or its average value (for actual experimental data). Computations for *analysis of variance* involve the partitioning of a sum of SDM.

## Introduction

An understanding of the computations involved is greatly enhanced by a study of the statistical value

- , where is the expected value operator.

For a random variable with mean and variance ,

^{[1]}

Therefore,

From the above, the following can be derived:

## Sample variance

The sum of squared deviations needed to calculate sample variance (before deciding whether to divide by *n* or *n* − 1) is most easily calculated as

From the two derived expectations above the expected value of this sum is

which implies

This effectively proves the use of the divisor *n* − 1 in the calculation of an **unbiased** sample estimate of *σ*^{2}.

## Partition — analysis of variance

In the situation where data is available for *k* different treatment groups having size *n*_{i} where *i* varies from 1 to *k*, then it is assumed that the expected mean of each group is

and the variance of each treatment group is unchanged from the population variance .

Under the Null Hypothesis that the treatments have no effect, then each of the will be zero.

It is now possible to calculate three sums of squares:

- Individual

- Treatments

Under the null hypothesis that the treatments cause no differences and all the are zero, the expectation simplifies to

- Combination

### Sums of squared deviations

Under the null hypothesis, the difference of any pair of *I*, *T*, and *C* does not contain any dependency on , only .

- total squared deviations aka
*total sum of squares*

- treatment squared deviations aka
*explained sum of squares*

- residual squared deviations aka
*residual sum of squares*

The constants (*n* − 1), (*k* − 1), and (*n* − *k*) are normally referred to as the number of degrees of freedom.

### Example

In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6.

Giving

- Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom.
- Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom.
- Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.

### Two-way analysis of variance

The following hypothetical example gives the yields of 15 plants subject to two different environmental variations, and three different fertilisers.

Extra CO_{2} |
Extra humidity | |
---|---|---|

No fertiliser | 7, 2, 1 | 7, 6 |

Nitrate | 11, 6 | 10, 7, 3 |

Phosphate | 5, 3, 4 | 11, 4 |

Five sums of squares are calculated:

Factor | Calculation | Sum | |
---|---|---|---|

Individual | 641 | 15 | |

Fertilizer × Environment | 556.1667 | 6 | |

Fertilizer | 525.4 | 3 | |

Environment | 519.2679 | 2 | |

Composite | 504.6 | 1 |

Finally, the sums of squared deviations required for the analysis of variance can be calculated.

Factor | Sum | Total | Environment | Fertiliser | Fertiliser × Environment | Residual | |
---|---|---|---|---|---|---|---|

Individual | 641 | 15 | 1 | 1 | |||

Fertiliser × Environment | 556.1667 | 6 | 1 | −1 | |||

Fertiliser | 525.4 | 3 | 1 | −1 | |||

Environment | 519.2679 | 2 | 1 | −1 | |||

Composite | 504.6 | 1 | −1 | −1 | −1 | 1 | |

Squared deviations | 136.4 | 14.668 | 20.8 | 16.099 | 84.833 | ||

Degrees of freedom | 14 | 1 | 2 | 2 | 9 |

## See also

- Absolute deviation
- Algorithms for calculating variance
- Errors and residuals
- Least squares
- Mean squared error
- Residual sum of squares
- Variance decomposition

## References

**^**Mood & Graybill:*An introduction to the Theory of Statistics*(McGraw Hill)