14. Time Data Types

The time data types are broken out into a separate section from the introductory section on data types. (Basic Data Types) The reason for this is that dealing with time data can be subtle and must be done carefully because the data type can be cast in a variety of different ways. It is not an introductory topic, and if not done well can scare off the normal people.

I will first go over the basic time data types and then explore the different kinds of operations that are done with the time data types. Please be cautious with time data and read the complete description including the caveats. There are some common mistakes that result in calculations that yield results that are very different from the intended values.

14.1. Time and Date Variables

There are a variety of different types specific to time data fields in R. Here we only look at two, the POSIXct and POSIXlt data types:

POSIXct

The POSIXct data type is the number of seconds since the start of January 1, 1970. Negative numbers represent the number of seconds before this time, and positive numbers represent the number of seconds afterwards.

POSIXlt

The POSIXlt data type is a vector, and the entries in the vector have the following meanings:

  1. seconds
  2. minutes
  3. hours
  4. day of month (1-31)
  5. month of the year (0-11)
  6. years since 1900
  7. day of the week (0-6 where 0 represents Sunday)
  8. day of the year (0-365)
  9. Daylight savings indicator (positive if it is daylight savings)

Part of the difficulty with time data types is that R prints them out in a way that is different from how it stores them internally. This can make type conversions tricky, and you have to be careful and test your operations to insure that R is doing what you think it is doing.

To get the current time, the Sys.time() can be used, and you can play around a bit with the basic types to get a feel for what R is doing. The as.POSIXct and as.POSIXlt commands are used to convert the time value into the different formats.

> help(DateTimeClasses)
> t <- Sys.time()
> typeof(t)
[1] "double"
> t
[1] "2014-01-23 14:28:21 EST"
> print(t)
[1] "2014-01-23 14:28:21 EST"
> cat(t,"\n")
1390505301
> c <- as.POSIXct(t)
> typeof(c)
[1] "double"
> print(c)
[1] "2014-01-23 14:28:21 EST"
> cat(c,"\n")
1390505301
>
>
> l <- as.POSIXlt(t)
> l
[1] "2014-01-23 14:28:21 EST"
> typeof(l)
[1] "list"
> print(l)
[1] "2014-01-23 14:28:21 EST"
> cat(l,"\n")
Error in cat(list(...), file, sep, fill, labels, append) :
argument 1 (type 'list') cannot be handled by 'cat'
> names(l)
NULL
> l[[1]]
[1] 21.01023
> l[[2]]
[1] 28
> l[[3]]
[1] 14
> l[[4]]
[1] 23
> l[[5]]
[1] 0
> l[[6]]
[1] 114
> l[[7]]
[1] 4
> l[[8]]
[1] 22
> l[[9]]
[1] 0
>
> b <- as.POSIXct(l)
> cat(b,"\n")
1390505301

There are times when you have a time data type and want to convert it into a string so it can be saved into a file to be read by another application. The strftime command is used to take a time data type and convert it to a string. You must supply an additional format string to let R what format you want to use. See the help page on strftime to get detailed information about the format string.

> help(strftime)
>
> t <- Sys.time()
> cat(t,"\n")
1390506463
> timeStamp <-  strftime(t,"%Y-%m-%d %H:%M:%S")
> timeStamp
[1] "2014-01-23 14:47:43"
> typeof(timeStamp)
[1] "character"

Commonly a time stamp is saved in a data file, and it must be converted into a time data type to allow for calculations. For example, you may be interested in how much time has elapsed between two observations. The strptime command is used to take a string and convert it into a time data type. Like strftime it requires a format string in addition to the time stamp.

The strptime command is used to take a string and convert it into a form that R can use for calculations. In the following example a data frame is defined that has the dates stored as strings. If you read the data in from a csv file this is how R will keep track of the data. Note that in this context the strings are assumed to represent ordinal data, and R will assume that the data field is a set of factors. You have to use the strptime command to convert it into a time field.

> myData <- data.frame(time=c("2014-01-23 14:28:21","2014-01-23 14:28:55",
                              "2014-01-23 14:29:02","2014-01-23 14:31:18"),
                      speed=c(2.0,2.2,3.4,5.5))
> myData
                 time speed
1 2014-01-23 14:28:21   2.0
2 2014-01-23 14:28:55   2.2
3 2014-01-23 14:29:02   3.4
4 2014-01-23 14:31:18   5.5
> summary(myData)
                 time       speed
2014-01-23 14:28:21:1   Min.   :2.000
2014-01-23 14:28:55:1   1st Qu.:2.150
2014-01-23 14:29:02:1   Median :2.800
2014-01-23 14:31:18:1   Mean   :3.275
                        3rd Qu.:3.925
                        Max.   :5.500
> myData$time[1]
[1] 2014-01-23 14:28:21
4 Levels: 2014-01-23 14:28:21 2014-01-23 14:28:55 ... 2014-01-23 14:31:18
> typeof(myData$time[1])
[1] "integer"
>
>
> myData$time <- strptime(myData$time,"%Y-%m-%d %H:%M:%S")
> myData
                 time speed
1 2014-01-23 14:28:21   2.0
2 2014-01-23 14:28:55   2.2
3 2014-01-23 14:29:02   3.4
4 2014-01-23 14:31:18   5.5
> myData$time[1]
[1] "2014-01-23 14:28:21"
> typeof(myData$time[1])
[1] "list"
> myData$time[1][[2]]
[1] 28

Now you can perform operations on the fields. For example you can determine the time between observations. (Please see the notes below on time operations. This example is a bit misleading!)

> N = length(myData$time)
> myData$time[2:N] - myData$time[1:(N-1)]
Time differences in secs
[1]  34   7 136
attr(,"tzone")
[1] ""

In addition to the time data types R also has a date data type. The difference is that the date data type keeps track of numbers of days rather than seconds. You can cast a string into a date type using the as.Date function. The as.Date function takes the same arguments as the time data types discussed above.

> theDates <- c("1 jan 2012","1 jan 2013","1 jan 2014")
> dateFields <- as.Date(theDates,"%d %b %Y")
> typeof(dateFields)
[1] "double"
> dateFields
[1] "2012-01-01" "2013-01-01" "2014-01-01"
> N <- length(dateFields)
> diff <- dateFields[1:(N-1)] - dateFields[2:N]
> diff
Time differences in days
[1] -366 -365

You can also define a date in terms of the number days after another date using the origin option.

> infamy <- as.Date(-179,origin="1942-06-04")
> infamy
[1] "1941-12-07"
>
> today <- Sys.Date()
> today
[1] "2014-01-23"
> today-infamy
Time difference of 26345 days

Finally, a nice function to know about and use is the format command. It can be used in a wide variety of situations, and not just for dates. It is helpful for dates, though, because you can use it in cat and print statements to make sure that your output is in exactly the form that you want.

> theTime <- Sys.time()
> theTime
[1] "2014-01-23 16:15:05 EST"
> a <- rexp(1,0.1)
> a
[1] 7.432072
> cat("At about",format(theTime,"%H:%M"),"the time between occurances was around",format(a,digits=3),"seconds\n")
At about 16:15 the time between occurances was around 7.43 seconds

14.2. Time Operations

The most difficult part of dealing with time data can be converting it into the right format. Once a time or date is stored in R’s internal format then a number of basic operations are available. The thing to keep in mind, though, is that the units you get after an operation can vary depending on the magnitude of the time values. Be very careful when dealing with time operations and vigorously test your codes.

> now <- Sys.time()
> now
[1] "2014-01-23 16:31:00 EST"
> now-60
[1] "2014-01-23 16:30:00 EST"
>
> earlier <- strptime("2000-01-01 00:00:00","%Y-%m-%d %H:%M:%S")
> later <- strptime("2000-01-01 00:00:20","%Y-%m-%d %H:%M:%S")
> later-earlier
Time difference of 20 secs
> as.double(later-earlier)
[1] 20
>
> earlier <- strptime("2000-01-01 00:00:00","%Y-%m-%d %H:%M:%S")
> later <- strptime("2000-01-01 01:00:00","%Y-%m-%d %H:%M:%S")
> later-earlier
Time difference of 1 hours
> as.double(later-earlier)
[1] 1
>
> up <- as.Date("1961-08-13")
> down <- as.Date("1989-11-9")
> down-up
Time difference of 10315 days

The two examples involving the variables earlier and later in the previous code sample should cause you a little concern. The value of the difference depends on the largest units with respect to the difference! The issue is that when you subtract dates R uses the equivalent of the difftime command. We need to know how this operates to reduce the ambiguity when comparing times.

> help(difftime)
>
> earlier <- strptime("2000-01-01 00:00:00","%Y-%m-%d %H:%M:%S")
> later <- strptime("2000-01-01 01:00:00","%Y-%m-%d %H:%M:%S")
> difftime(later,earlier)
Time difference of 1 hours
> difftime(later,earlier,units="secs")
Time difference of 3600 secs

One thing to be careful about difftime is that it is a double precision number, but it has units attached to it. This can be tricky, and you should be careful about the ambiguity in using this command. I personally always try to specify the units to avoid this.

> earlier <- strptime("2000-01-01 00:00:00","%Y-%m-%d %H:%M:%S")
> later <- strptime("2000-01-01 00:00:20","%Y-%m-%d %H:%M:%S")
> d <- difftime(later,earlier)
> d
Time difference of 20 secs
> typeof(d)
[1] "double"
> as.double(d)
[1] 20

Another way to define a time difference is to use the as.difftime command. It takes two dates and will compute the difference between them. It takes a time, its format, and the units to use. Note that in the following example R is able to figure out what the units are when making the calculation.

> diff <- as.difftime("00:30:00","%H:%M:%S",units="hour")
> diff
Time difference of 0.5 hours
> Sys.time()
[1] "2014-01-23 16:45:39 EST"
> Sys.time()+diff
[1] "2014-01-23 17:15:41 EST"

The last thing to mention is that once a time stamp is cast into one of R’s internal formats comparisons can be made in a natural way.

> diff <- as.difftime("00:30:00","%H:%M:%S",units="hour")
> now <- Sys.time()
> later <- now + diff
> now
[1] "2014-01-23 16:47:48 EST"
> later
[1] "2014-01-23 17:17:48 EST"
>
> if(now < later)
  {
     cat("there you go\n")
  }
there you go