# 9. Calculating Confidence Intervals¶

Contents

Here we look at some examples of calculating confidence intervals. The
examples are for both normal and t distributions. We assume that you
can enter data and know the commands associated with basic
probability. Note that an easier way to calculate confidence intervals
using the *t.test* command is discussed in section The Easy Way.

## 9.1. Calculating a Confidence Interval From a Normal Distribution¶

Here we will look at a fictitious example. We will make some assumptions for what we might find in an experiment and find the resulting confidence interval using a normal distribution. Here we assume that the sample mean is 5, the standard deviation is 2, and the sample size is 20. In the example below we will use a 95% confidence level and wish to find the confidence interval. The commands to find the confidence interval in R are the following:

```
> a <- 5
> s <- 2
> n <- 20
> error <- qnorm(0.975)*s/sqrt(n)
> left <- a-error
> right <- a+error
> left
[1] 4.123477
> right
[1] 5.876523
```

The true mean has a probability of 95% of being in the interval between 4.12 and 5.88 assuming that the original random variable is normally distributed, and the samples are independent.

## 9.2. Calculating a Confidence Interval From a t Distribution¶

Calculating the confidence interval when using a t-test is similar to using a normal distribution. The only difference is that we use the command associated with the t-distribution rather than the normal distribution. Here we repeat the procedures above, but we will assume that we are working with a sample standard deviation rather than an exact standard deviation.

Again we assume that the sample mean is 5, the sample standard deviation is 2, and the sample size is 20. We use a 95% confidence level and wish to find the confidence interval. The commands to find the confidence interval in R are the following:

```
> a <- 5
> s <- 2
> n <- 20
> error <- qt(0.975,df=n-1)*s/sqrt(n)
> left <- a-error
> right <- a+error
> left
[1] 4.063971
> right
[1] 5.936029
```

The true mean has a probability of 95% of being in the interval between 4.06 and 5.94 assuming that the original random variable is normally distributed, and the samples are independent.

We now look at an example where we have a univariate data set and want to find the 95% confidence interval for the mean. In this example we use one of the data sets given in the data input chapter. We use the w1.dat data set:

```
> w1 <- read.csv(file="w1.dat",sep=",",head=TRUE)
> summary(w1)
vals
Min. :0.130
1st Qu.:0.480
Median :0.720
Mean :0.765
3rd Qu.:1.008
Max. :1.760
> length(w1$vals)
[1] 54
> mean(w1$vals)
[1] 0.765
> sd(w1$vals)
[1] 0.3781222
```

We can now calculate an error for the mean:

```
> error <- qt(0.975,df=length(w1$vals)-1)*sd(w1$vals)/sqrt(length(w1$vals))
> error
[1] 0.1032075
```

The confidence interval is found by adding and subtracting the error from the mean:

```
> left <- mean(w1$vals)-error
> right <- mean(w1$vals)+error
> left
[1] 0.6617925
> right
[1] 0.8682075
```

There is a 95% probability that the true mean is between 0.66 and 0.87 assuming that the original random variable is normally distributed, and the samples are independent.

## 9.3. Calculating Many Confidence Intervals From a t Distribution¶

Suppose that you want to find the confidence intervals for many tests. This is a common task and most software packages will allow you to do this.

We have three different sets of results:

Comparison 1 |
|||

Mean | Std. Dev. | Number (pop.) | |

Group I | 10 | 3 | 300 |

Group II | 10.5 | 2.5 | 230 |

Comparison 2 |
|||

Mean | Std. Dev. | Number (pop.) | |

Group I | 12 | 4 | 210 |

Group II | 13 | 5.3 | 340 |

Comparison 3 |
|||

Mean | Std. Dev. | Number (pop.) | |

Group I | 30 | 4.5 | 420 |

Group II | 28.5 | 3 | 400 |

For each of these comparisons we want to calculate the associated confidence interval for the difference of the means. For each comparison there are two groups. We will refer to group one as the group whose results are in the first row of each comparison above. We will refer to group two as the group whose results are in the second row of each comparison above. Before we can do that we must first compute a standard error and a t-score. We will find general formulae which is necessary in order to do all three calculations at once.

We assume that the means for the first group are defined in a variable
called *m1*. The means for the second group are defined in a variable
called *m2*. The standard deviations for the first group are in a
variable called *sd1*. The standard deviations for the second group
are in a variable called *sd2*. The number of samples for the first
group are in a variable called *num1*. Finally, the number of samples
for the second group are in a variable called *num2*.

With these definitions the standard error is the square root of
*(sd1^2)/num1+(sd2^2)/num2*. The R commands to do this can be found
below:

```
> m1 <- c(10,12,30)
> m2 <- c(10.5,13,28.5)
> sd1 <- c(3,4,4.5)
> sd2 <- c(2.5,5.3,3)
> num1 <- c(300,210,420)
> num2 <- c(230,340,400)
> se <- sqrt(sd1*sd1/num1+sd2*sd2/num2)
> error <- qt(0.975,df=pmin(num1,num2)-1)*se
```

To see the values just type in the variable name on a line alone:

```
> m1
[1] 10 12 30
> m2
[1] 10.5 13.0 28.5
> sd1
[1] 3.0 4.0 4.5
> sd2
[1] 2.5 5.3 3.0
> num1
[1] 300 210 420
> num2
[1] 230 340 400
> se
[1] 0.2391107 0.3985074 0.2659216
> error
[1] 0.4711382 0.7856092 0.5227825
```

Now we need to define the confidence interval around the assumed
differences. Just as in the case of finding the p values in previous
chapter we have to use the *pmin* command to get the number of degrees
of freedom. In this case the null hypotheses are for a difference of
zero, and we use a 95% confidence interval:

```
> left <- (m1-m2)-error
> right <- (m1-m2)+error
> left
[1] -0.9711382 -1.7856092 0.9772175
> right
[1] -0.02886177 -0.21439076 2.02278249
```

This gives the confidence intervals for each of the three tests. For example, in the first experiment the 95% confidence interval is between -0.97 and -0.03 assuming that the random variables are normally distributed, and the samples are independent.