R: Sample variance and SD (2024)

Sample variance and Standard Deviation using R

Variance and SD

R can calculate the sample variance and sample standard deviation of our cattle weight data using these instructions:

Giving:

R: Sample variance and SD (1)
> var(y)
[1] 1713.333
> sd(y)
[1] 41.39243
R: Sample variance and SD (2)
    Note:
  • var(y) instructs R to calculate the sample variance of Y. In other words it uses n-1 'degrees of freedom', where n is the number of observations in Y.
  • sd(y) instructs R to return the sample standard deviation of y, using n-1 degrees of freedom.
  • sd(y) = sqrt(var(y)). In other words, this is the uncorrected sample standard deviation.
  • This var function cannot give the 'population variance', which has n not n-1 d.f. But, there are 2 simple ways to achieve that:
  • Remember if n=1 the second variance formula will always yield zero, because the mean of y will equal y, whereas the first formula will always yield NA, because 0/(1-1) = 0/0 and cannot be evaluated.
  • Similarly, to obtain the 'population' standard deviation, use:

R: Sample variance and SD (3)

Variance from frequencies and midpoints

R can calculate the variance from the frequencies (f) of a frequency distribution with class midpoints (y) using these instructions:

Giving:

R: Sample variance and SD (4)

[1] 143.8768

R: Sample variance and SD (5)
    Note:
  • y=c(110, 125, 135, 155) copies the class interval midpoints into a variable called y.
  • f=c(23, 15, 6, 2) copies the frequency of each class into a variable called f.
  • ybar=sum(y*f)/sum(f) creates a variable called ybar, containing the arithmetic mean - as calculated from these frequencies and midpoints.

    However, even if you have a more accurate arithmetic mean, calculated directly from the observations themselves, you need to use this formula. If you do not do this your estimated variance will be too high - because this formula gives the mean based upon the same assumptions as your variance will be calculated.

  • sum(f*(y-ybar)^2) / (sum(f)-1) calculates the sample variance from the frequencies, f, midpoints, y, and the mean estimated from them, ybar.

    Alternately, you could combine two of these instructions as: sum(f*(y-sum(y*f)/sum(f))^2)/(sum(f)-1)

  • Remember this only provides an estimate of the variance you would obtain from the original data - and is dependent upon the choice of midpoints, and the number of class intervals used.

R: Sample variance and SD (6)

R: Sample variance and SD (2024)
Top Articles
Latest Posts
Article information

Author: Kieth Sipes

Last Updated:

Views: 5619

Rating: 4.7 / 5 (67 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Kieth Sipes

Birthday: 2001-04-14

Address: Suite 492 62479 Champlin Loop, South Catrice, MS 57271

Phone: +9663362133320

Job: District Sales Analyst

Hobby: Digital arts, Dance, Ghost hunting, Worldbuilding, Kayaking, Table tennis, 3D printing

Introduction: My name is Kieth Sipes, I am a zany, rich, courageous, powerful, faithful, jolly, excited person who loves writing and wants to share my knowledge and understanding with you.