15. Introduction to Programming

We look at running commands from a source file. We also include an overview of the different statements that are used for control-flow that determines which code is executed by the interpreter.

15.1. Executing a file

In the next section the ways to execute the commands in a file using the source command are given. The remaining sections are used to list the various flow control options that are available in the R language definition. The language definition has a wide variety of control functions which can be found using the help command.

> help(Control)
>

15.1.1. Executing the commands in a File

A set of R commands can be saved in a file and then executed as if you had typed them in from the command line. The source command is used to read the file and execute the commands in the same sequence given in the file.

> source('file.R')
> help(source)
>

If you simply source the file the commands are not printed, and the results of commands are not printed. This can be overridden using the echo, print.eval, and verbose options.

Some examples are given assuming that a file, simpleEx.R, is in the current directory. The file is given below:

# Define a variable.
x <- rnorm(10)

# calculate the mean of x and print out the results.
mux = mean(x)
cat("The mean of x is ",mean(x),"\n")

# print out a summary of the results
summary(x)
cat("The summary of x is \n",summary(x),"\n")
print(summary(x))

The file also demonstrates the use of # to specify comments. Anything after the # is ignored. Also, the file demonstrates the use of cat and print to send results to the standard output. Note that the commands have options to send results to a file. Use help for more information.

The output for the different options can be found below:

> source('simpleEx.R')
The mean of x is  -0.4817475
The summary of x is
-2.24 -0.5342 -0.2862 -0.4817 -0.1973 0.4259
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 -2.2400 -0.5342 -0.2862 -0.4817 -0.1973  0.4259
>
>
>
> source('simpleEx.R',echo=TRUE)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
-2.32600 -0.69140 -0.06772 -0.13540  0.46820  1.69600
>
>
>
> source('simpleEx.R',print.eval=TRUE)
The mean of x is  0.1230581
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-1.7020 -0.2833  0.1174  0.1231  0.9103  1.2220
The summary of x is
-1.702 -0.2833 0.1174 0.1231 0.9103 1.222
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-1.7020 -0.2833  0.1174  0.1231  0.9103  1.2220
>
>
>
> source('simpleEx.R',print.eval=FALSE)
The mean of x is  0.6279428
The summary of x is
-0.7334 -0.164 0.9335 0.6279 1.23 1.604
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-0.7334 -0.1640  0.9335  0.6279  1.2300  1.6040
>
>
>
>
> source('simpleEx.R',verbose=TRUE)
'envir' chosen:<environment: R_GlobalEnv>
encoding = "native.enc" chosen
--> parsed 6 expressions; now eval(.)ing them:

>>>> eval(expression_nr. 1 )
                 =================

> # Define a variable.
> x <- rnorm(10)
curr.fun: symbol <-
 .. after ‘expression(x <- rnorm(10))’

>>>> eval(expression_nr. 2 )
                 =================

> # calculate the mean of x and print out the results.
> mux = mean(x)
curr.fun: symbol =
 .. after ‘expression(mux = mean(x))’

>>>> eval(expression_nr. 3 )
                 =================

> cat("The mean of x is ",mean(x),"\n")
The mean of x is  -0.1090932
curr.fun: symbol cat
 .. after ‘expression(cat("The mean of x is ",mean(x),"\n"))’

>>>> eval(expression_nr. 4 )
                 =================

> # print out a summary of the results
> summary(x)
curr.fun: symbol summary
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-1.3820 -1.0550 -0.1995 -0.1091  0.6813  2.1050
 .. after ‘expression(summary(x))’

>>>> eval(expression_nr. 5 )
                 =================

> cat("The summary of x is \n",summary(x),"\n")
The summary of x is
 -1.382 -1.055 -0.1995 -0.1091 0.6813 2.105
curr.fun: symbol cat
 .. after ‘expression(cat("The summary of x is \n",summary(x),"\n"))’

>>>> eval(expression_nr. 6 )
                 =================

> print(summary(x))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-1.3820 -1.0550 -0.1995 -0.1091  0.6813  2.1050
curr.fun: symbol print
 .. after ‘expression(print(summary(x)))’

One common problem that occurs is that R may not know where to find a file.

> source('notThere.R')
Error in file(filename, "r", encoding = encoding) :
  cannot open the connection
In addition: Warning message:
In file(filename, "r", encoding = encoding) :
  cannot open file 'notThere.R': No such file or directory

R will search the current working directory. You can see what files are in the directory using the dir command, and you can determine the current directory using the getwd command.

> getwd()
[1] "/home/black/public_html/tutorial/R/rst/source/R"
> dir()
[1] "plotting.rData" "power.R"        "shadedRegion.R"

You can change the current directory, and the options available depend on how you are using R. For example on a Windows PC or a Macintosh you can use the menu options to change the working directory. You can choose the directory using a graphical file browser. Otherwise, you can change to the correct directory before running R or use the setwd command.

15.1.2. if statements

Conditional execution is available using the if statement and the corresponding else statement.

> x = 0.1
> if( x < 0.2)
  {
     x <- x + 1
     cat("increment that number!\n")
  }
increment that number!
> x
[1] 1.1

The else statement can be used to specify an alternate option. In the example below note that the else statement must be on the same line as the ending brace for the previous if block.

> x = 2.0
> if ( x < 0.2)
 {
    x <- x + 1
    cat("increment that number!\n")
 } else
 {
    x <- x - 1
    cat("nah, make it smaller.\n");
 }
nah, make it smaller.
> x
[1] 1

Finally, the if statements can be chained together for multiple options. The if statement is considered a single code block, so more if statements can be added after the else.

> x = 1.0
> if ( x < 0.2)
 {
    x <- x + 1
    cat("increment that number!\n")
 } else if ( x < 2.0)
 {
   x <- 2.0*x
   cat("not big enough!\n")
 } else
 {
    x <- x - 1
    cat("nah, make it smaller.\n");
 }
not big enough!
> x
[1] 2

The argument to the if statement is a logical expression. A full list of logical operators can be found in the types document focusing on logical variables (Logical).

15.1.3. for statements

The for loop can be used to repeat a set of instructions, and it is used when you know in advance the values that the loop variable will have each time it goes through the loop. The basic format for the for loop is for(var in seq) expr

An example is given below:

> for (lupe in seq(0,1,by=0.3))
 {
    cat(lupe,"\n");
 }
0
0.3
0.6
0.9
>
> x <- c(1,2,4,8,16)
> for (loop in x)
 {
    cat("value of loop: ",loop,"\n");
 }
value of loop:  1
value of loop:  2
value of loop:  4
value of loop:  8
value of loop:  16

See the section on breaks for more options (break and next statements)

15.1.4. while statements

The while loop can be used to repeat a set of instructions, and it is often used when you do not know in advance how often the instructions will be executed. The basic format for a while loop is while(cond) expr

>
> lupe <- 1;
> x <- 1
> while(x < 4)
 {
    x <- rnorm(1,mean=2,sd=3)
    cat("trying this value: ",x," (",lupe," times in loop)\n");
    lupe <- lupe + 1
 }
trying this value:  -4.163169  ( 1  times in loop)
trying this value:  3.061946  ( 2  times in loop)
trying this value:  2.10693  ( 3  times in loop)
trying this value:  -2.06527  ( 4  times in loop)
trying this value:  0.8873237  ( 5  times in loop)
trying this value:  3.145076  ( 6  times in loop)
trying this value:  4.504809  ( 7  times in loop)

See the section on breaks for more options (break and next statements)

15.1.5. repeat statements

The repeat loop is similar to the while loop. The difference is that it will always begin the loop the first time. The while loop will only start the loop if the condition is true the first time it is evaluated. Another difference is that you have to explicitly specify when to stop the loop using the break command.

That is you need to execute the break statement to get out of the loop.

>  repeat
{
    x <- rnorm(1)
    if(x < -2.0) break
}
> x
[1] -2.300532

See the section on breaks for more options (break and next statements)

15.1.6. break and next statements

The break statement is used to stop the execution of the current loop. It will break out of the current loop. The next statement is used to skip the statements that follow and restart the current loop. If a for loop is used then the next statement will update the loop variable.

> x <- rnorm(5)
> x
[1]  1.41699338  2.28086759 -0.01571884  0.56578443  0.60400784
> for(lupe in x)
 {
     if (lupe > 2.0)
         next

     if( (lupe<0.6) && (lupe > 0.5))
        break

    cat("The value of lupe is ",lupe,"\n");
 }
The value of lupe is  1.416993
The value of lupe is  -0.01571884

15.1.7. switch statement

The switch takes an expression and returns a value in a list based on the value of the expression. How it does this depends on the data type of the expression. The basic syntax is switch(statement,item1,item2,item3,…,itemN).

If the result of the expression is a number then it returns the item in the list with the same index. Note that the expression is cast as an integer if it is not an integer.

> x <- as.integer(2)
> x
[1] 2
> z = switch(x,1,2,3,4,5)
> z
[1] 2
> x <- 3.5
> z = switch(x,1,2,3,4,5)
> z
[1] 3

If the result of the expression is a string, then the list of items should be in the form “valueN”=resultN, and the statement will return the result that matches the value.

> y <- rnorm(5)
> y
[1]  0.4218635 -0.8205637 -1.0191267 -0.6080061 -0.6079133
> x <- "sd"
> z <- switch(x,"mean"=mean(y),"median"=median(y),"variance"=var(y),"sd"=sd(y))
> z
[1] 0.5571847
> x <- "median"
> z <- switch(x,"mean"=mean(y),"median"=median(y),"variance"=var(y),"sd"=sd(y))
> z
[1] -0.6080061

15.1.8. scan statement

The command to read input from the keyboard is the scan statement. It has a wide variety of options and can be fine tuned to your specific needs. We only look at the basics here. The scan statement waits for input from a user, and it returns the value that was typed in.

When using the command with no set number of lines the command will continue to read keyboard input until a blank line is entered.

> help(scan)
> a <- scan(what=double(0))
1: 3.5
2:
Read 1 item
> a
[1] 3.5
> typeof(a)
[1] "double"
>
> a <- scan(what=double(0))
1: yo!
1:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
scan() expected 'a real', got 'yo!'

If you wish to only have it read from a fixed number of lines the nmax option can specify how many lines can be typed in, and the multi.line option can be used to turn off multi-line entry.

> a <-  scan(what=double(0),nmax=1,multi.line = FALSE)
1: 6.7
Read 1 item
> a
[1] 6.7

15.2. Functions

A shallow overview of defining functions is given here. A few subtleties will be noted, but R can be a little quirky with respect to defining functions. The first bit of oddness is that you can think of a function as an object where you define the function and assign it to a variable name.

To define a function you assign it to a name, and the keyword function is used to denote the start of the function and its argument list.

> newDef <- function(a,b)
 {
     x = runif(10,a,b)
     mean(x)
 }
> newDef(-1,1)
[1] 0.06177728
> newDef
function(a,b)
{
   x = runif(10,a,b)
   mean(x)
}

The last expression in the function is what is returned. So in the example above the sample mean of the numbers is returned.

> x <- newDef(0,1)
> x
[1] 0.4800442

The arguments that are passed are matched in order. They can be specified explicitly, though.

> newDef(b=10,a=1)
[1] 4.747509
> newDef(10,1)
[1] NaN
Warning message:
In runif(10, a, b) : NAs produced

You can mix this approach, and R will try to match up the named arguments and then match the rest going from left to right. Another bit of weirdness is that R will not evaluate an expression in the argument list until the moment it is needed in the function. This is a different kind of behavior than what most people are used to, so be very careful about this. The best rule of thumb is to not put in operations in an argument list if they matter after the function is called.

Another common task is to have a function return multiple items. This can be accomplished by returning a list of items. The objects within a list can be accessed using the same $ notation that is used for data frames.

> c = c(1,2,3,4,5)
> sample <- function(a,b)
{
  value = switch(a,"median"=median(b),"mean"=mean(b),"variance"=var(b))
  largeVals = length(c[c>1])
  list(stat=value,number=largeVals)
}
> result <- sample("median",c)
> result
$stat
[1] 3

$number
[1] 4

> result$stat
[1] 3
> result$number
[1] 4

There is another potential problem that can occur when using a function in R. When it comes to determining the value of a variable there is a path that R will use to search for its value. In the case of functions if a previously undefined variable appears R will look at the argument list for the function. Next it will look in the current work space. If you are not careful R will find the value some place where you do not expect it, and your function will return a value that is not correct, and no error will be given. Be very careful about the names of variables especially when using functions.