9. Calculating Confidence Intervals¶
Contents
Here we look at some examples of calculating confidence intervals. The examples are for both normal and t distributions. We assume that you can enter data and know the commands associated with basic probability. Note that an easier way to calculate confidence intervals using the t.test command is discussed in section The Easy Way.
9.1. Calculating a Confidence Interval From a Normal Distribution¶
Here we will look at a fictitious example. We will make some assumptions for what we might find in an experiment and find the resulting confidence interval using a normal distribution. Here we assume that the sample mean is 5, the standard deviation is 2, and the sample size is 20. In the example below we will use a 95% confidence level and wish to find the confidence interval. The commands to find the confidence interval in R are the following:
> a <- 5
> s <- 2
> n <- 20
> error <- qnorm(0.975)*s/sqrt(n)
> left <- a-error
> right <- a+error
> left
[1] 4.123477
> right
[1] 5.876523
Our level of certainty about the true mean is 95% in predicting that the true mean is within the interval between 4.12 and 5.88 assuming that the original random variable is normally distributed, and the samples are independent.
9.2. Calculating a Confidence Interval From a t Distribution¶
Calculating the confidence interval when using a t-test is similar to using a normal distribution. The only difference is that we use the command associated with the t-distribution rather than the normal distribution. Here we repeat the procedures above, but we will assume that we are working with a sample standard deviation rather than an exact standard deviation.
Again we assume that the sample mean is 5, the sample standard deviation is 2, and the sample size is 20. We use a 95% confidence level and wish to find the confidence interval. The commands to find the confidence interval in R are the following:
> a <- 5
> s <- 2
> n <- 20
> error <- qt(0.975,df=n-1)*s/sqrt(n)
> left <- a-error
> right <- a+error
> left
[1] 4.063971
> right
[1] 5.936029
Our level of certainty about the true mean is 95% in predicting that the true mean is within the interval between 4.06 and 5.94 assuming that the original random variable is normally distributed, and the samples are independent.
We now look at an example where we have a univariate data set and want to find the 95% confidence interval for the mean. In this example we use one of the data sets given in the data input chapter. We use the w1.dat data set:
> w1 <- read.csv(file="w1.dat",sep=",",head=TRUE)
> summary(w1)
vals
Min. :0.130
1st Qu.:0.480
Median :0.720
Mean :0.765
3rd Qu.:1.008
Max. :1.760
> length(w1$vals)
[1] 54
> mean(w1$vals)
[1] 0.765
> sd(w1$vals)
[1] 0.3781222
We can now calculate an error for the mean:
> error <- qt(0.975,df=length(w1$vals)-1)*sd(w1$vals)/sqrt(length(w1$vals))
> error
[1] 0.1032075
The confidence interval is found by adding and subtracting the error from the mean:
> left <- mean(w1$vals)-error
> right <- mean(w1$vals)+error
> left
[1] 0.6617925
> right
[1] 0.8682075
Our level of certainty about the true mean is 95% in predicting that the true mean is within the interval between 0.66 and 0.87 assuming that the original random variable is normally distributed, and the samples are independent.
9.3. Calculating Many Confidence Intervals From a t Distribution¶
Suppose that you want to find the confidence intervals for many tests. This is a common task and most software packages will allow you to do this.
We have three different sets of results:
Comparison 1 | |||
Mean | Std. Dev. | Number (pop.) | |
Group I | 10 | 3 | 300 |
Group II | 10.5 | 2.5 | 230 |
Comparison 2 | |||
Mean | Std. Dev. | Number (pop.) | |
Group I | 12 | 4 | 210 |
Group II | 13 | 5.3 | 340 |
Comparison 3 | |||
Mean | Std. Dev. | Number (pop.) | |
Group I | 30 | 4.5 | 420 |
Group II | 28.5 | 3 | 400 |
For each of these comparisons we want to calculate the associated confidence interval for the difference of the means. For each comparison there are two groups. We will refer to group one as the group whose results are in the first row of each comparison above. We will refer to group two as the group whose results are in the second row of each comparison above. Before we can do that we must first compute a standard error and a t-score. We will find general formulae which is necessary in order to do all three calculations at once.
We assume that the means for the first group are defined in a variable called m1. The means for the second group are defined in a variable called m2. The standard deviations for the first group are in a variable called sd1. The standard deviations for the second group are in a variable called sd2. The number of samples for the first group are in a variable called num1. Finally, the number of samples for the second group are in a variable called num2.
With these definitions the standard error is the square root of (sd1^2)/num1+(sd2^2)/num2. The R commands to do this can be found below:
> m1 <- c(10,12,30)
> m2 <- c(10.5,13,28.5)
> sd1 <- c(3,4,4.5)
> sd2 <- c(2.5,5.3,3)
> num1 <- c(300,210,420)
> num2 <- c(230,340,400)
> se <- sqrt(sd1*sd1/num1+sd2*sd2/num2)
> error <- qt(0.975,df=pmin(num1,num2)-1)*se
To see the values just type in the variable name on a line alone:
> m1
[1] 10 12 30
> m2
[1] 10.5 13.0 28.5
> sd1
[1] 3.0 4.0 4.5
> sd2
[1] 2.5 5.3 3.0
> num1
[1] 300 210 420
> num2
[1] 230 340 400
> se
[1] 0.2391107 0.3985074 0.2659216
> error
[1] 0.4711382 0.7856092 0.5227825
Now we need to define the confidence interval around the assumed differences. Just as in the case of finding the p values in previous chapter we have to use the pmin command to get the number of degrees of freedom. In this case the null hypotheses are for a difference of zero, and we use a 95% confidence interval:
> left <- (m1-m2)-error
> right <- (m1-m2)+error
> left
[1] -0.9711382 -1.7856092 0.9772175
> right
[1] -0.02886177 -0.21439076 2.02278249
This gives the confidence intervals for each of the three tests. For example, in the first experiment the 95% confidence interval is between -0.97 and -0.03 assuming that the random variables are normally distributed, and the samples are independent.