7. Indexing Into Vectors

Given a vector of data one common task is to isolate particular entries or censor items that meet some criteria. Here we show how to use R’s indexing notation to pick out specific items within a vector.

7.1. Indexing With Logicals

We first give an example of how to select specific items in a vector. The first step is to define a vector of data, and the second step is to define a vector made up of logical values. When the vector of logical values is used for the index into the vector of data values only the items corresponding to the variables that evaluate to TRUE are returned:

> a <- c(1,2,3,4,5)
> b <- c(TRUE,FALSE,FALSE,TRUE,FALSE)
> a[b]
[1] 1 4
> max(a[b])
[1] 4
> sum(a[b])
[1] 5

7.2. Not Available or Missing Values

One common problem is data entries that are marked NA or not available. There is a predefined variable called NA that can be used to indicate missing information. The problem with this is that some functions throw an error if one of the entries in the data is NA. Some functions allow you to ignore the missing values through special options:

> a <- c(1,2,3,4,NA)
> a
[1]  1  2  3  4 NA
> sum(a)
[1] NA
> sum(a,na.rm=TRUE)
[1] 10

There are other times, though, when this option is not available, or you simply want to censor them. The is.na function can be used to determine which items are not available. The logical “not” operator in R is the ! symbol. When used with the indexing notation the items within a vector that are NA can be easily removed:

> a <- c(1,2,3,4,NA)
> is.na(a)
[1] FALSE FALSE FALSE FALSE  TRUE
> !is.na(a)
[1]  TRUE  TRUE  TRUE  TRUE FALSE
> a[!is.na(a)]
[1] 1 2 3 4
> b <- a[!is.na(a)]
> b
[1] 1 2 3 4

7.3. Indices With Logical Expression

Any logical expression can be used as an index which opens a wide range of possibilities. For example, you can remove or focus on entries that match specific criteria. For example, you might want to remove all entries that are above a certain value:

> a = c(6,2,5,3,8,2)
> a
[1] 6 2 5 3 8 2
> b = a[a<6]
> b
[1] 2 5 3 2

For another example, suppose you want to join together the values that match two different factors in another vector:

> d = data.frame(one=as.factor(c('a','a','b','b','c','c')),
                two=c(1,2,3,4,5,6))
> d
  one two
1   a   1
2   a   2
3   b   3
4   b   4
5   c   5
6   c   6
> both = d$two[(d$one=='a') | (d$one=='b')]
> both
[1] 1 2 3 4

Note that a single ‘|’ was used in the previous example. There is a difference between ‘||’ and ‘|’. A single bar will perform a vector operation, term by term, while a double bar will evaluate to a single TRUE or FALSE result:

> (c(TRUE,TRUE))|(c(FALSE,TRUE))
[1] TRUE TRUE
> (c(TRUE,TRUE))||(c(FALSE,TRUE))
[1] TRUE
> (c(TRUE,TRUE))&(c(FALSE,TRUE))
[1] FALSE  TRUE
> (c(TRUE,TRUE))&&(c(FALSE,TRUE))
[1] FALSE