Dates, date-times, and times \(readr-parse\_datetime\)

library(tidyverse)
library(readr)

You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date-time (the number of seconds since midnight 1970-01-01), or a time (the number of seconds since midnight). When called without any additional arguments:

  • parse_datetime() expects an ISO8601 date-time. ISO8601 is an international standard in which the components of a date are organised from biggest to smallest: year, month, day, hour, minute, second.

    parse_datetime("2010-10-01T2010")
    #> [1] "2010-10-01 20:10:00 UTC"
    # If time is omitted, it will be set to midnight
    parse_datetime("20101010")
    #> [1] "2010-10-10 UTC"
  • parse_date() expects a four digit year, a - or /, the month, a - or /, then the day:

    parse_date("2010-10-01")
    #> [1] "2010-10-01"
    parse_date("2010/10/01")
    #> [1] "2010-10-01"
    parse_date("10/10/01")
    #> Warning: 1 parsing failure.
    #> row col   expected   actual
    #>   1  -- date like  10/10/01
    #> [1] NA
    parse_date("20101001")
    #> Warning: 1 parsing failure.
    #> row col   expected   actual
    #>   1  -- date like  20101001
    #> [1] NA
  • parse_time() expects the hour, :, minutes, optionally : and seconds, and an optional am/pm specifier:

    library(hms)
    parse_time("01:10 am")
    #> 01:10:00
    parse_time("20:10:01")
    #> 20:10:01

If these defaults don’t work for your data you can supply your own date-time format, built up of the following pieces:

Year
%Y (4 digits).
%y (2 digits); 00-69 -> 2000-2069, 70-99 -> 1970-1999.
Month
%m (2 digits).
%b (abbreviated name, like “Jan”).
%B (full name, “January”).
Day
%d (2 digits or 1 digit).
%e (2 digits).
Time
%H 0-23 hour.
%I 0-12, must be used with %p.
%p AM/PM indicator.
%M minutes.
%S integer seconds.
%OS real seconds.
%Z Time zone (as name, e.g. America/Chicago, Asia/Shanghai). Beware of abbreviations: if you’re American, note that “EST” is a Canadian time zone that does not have daylight savings time. It is not Eastern Standard Time! We’ll come back to this time zones.
%z (as offset from UTC, e.g. +0800).
Non-digits
%. skips one non-digit character.
%* skips any number of non-digits.

The best way to figure out the correct format is to create a few examples in a character vector, and test with one of the parsing functions. For example:

parse_date("01/02/15", "%m/%d/%y")
#> [1] "2015-01-02"
parse_date("01/02/15", "%d/%m/%y")
#> [1] "2015-02-01"
parse_date("2015年2月1日", "%Y年%m月%d日")
#> [1] "2015-02-01"
parse_date("01/02/15", "%y/%m/%d")
#> [1] "2001-02-15"
parse_date("01/02/15", "%y%.%m%.%d")
#> [1] "2001-02-15"
parse_date("01//02/15", "%y%*%m%*%d")
#> [1] "2001-02-15"

If you’re using %b or %B with non-English month names, you’ll need to set the lang argument to locale(). See the list of built-in languages in date_names_langs(), or if your language is not already included, create your own with date_names().

date_names_langs()
#>   [1] "af"  "agq" "ak"  "am"  "ar"  "as"  "asa" "az"  "bas" "be"  "bem" "bez"
#>  [13] "bg"  "bm"  "bn"  "bo"  "br"  "brx" "bs"  "ca"  "cgg" "chr" "cs"  "cy" 
#>  [25] "da"  "dav" "de"  "dje" "dsb" "dua" "dyo" "dz"  "ebu" "ee"  "el"  "en" 
#>  [37] "eo"  "es"  "et"  "eu"  "ewo" "fa"  "ff"  "fi"  "fil" "fo"  "fr"  "fur"
#>  [49] "fy"  "ga"  "gd"  "gl"  "gsw" "gu"  "guz" "gv"  "ha"  "haw" "he"  "hi" 
#>  [61] "hr"  "hsb" "hu"  "hy"  "id"  "ig"  "ii"  "is"  "it"  "ja"  "jgo" "jmc"
#>  [73] "ka"  "kab" "kam" "kde" "kea" "khq" "ki"  "kk"  "kkj" "kl"  "kln" "km" 
#>  [85] "kn"  "ko"  "kok" "ks"  "ksb" "ksf" "ksh" "kw"  "ky"  "lag" "lb"  "lg" 
#>  [97] "lkt" "ln"  "lo"  "lt"  "lu"  "luo" "luy" "lv"  "mas" "mer" "mfe" "mg" 
#> [109] "mgh" "mgo" "mk"  "ml"  "mn"  "mr"  "ms"  "mt"  "mua" "my"  "naq" "nb" 
#> [121] "nd"  "ne"  "nl"  "nmg" "nn"  "nnh" "nus" "nyn" "om"  "or"  "os"  "pa" 
#> [133] "pl"  "ps"  "pt"  "qu"  "rm"  "rn"  "ro"  "rof" "ru"  "rw"  "rwk" "sah"
#> [145] "saq" "sbp" "se"  "seh" "ses" "sg"  "shi" "si"  "sk"  "sl"  "smn" "sn" 
#> [157] "so"  "sq"  "sr"  "sv"  "sw"  "ta"  "te"  "teo" "th"  "ti"  "to"  "tr" 
#> [169] "twq" "tzm" "ug"  "uk"  "ur"  "uz"  "vai" "vi"  "vun" "wae" "xog" "yav"
#> [181] "yi"  "yo"  "zgh" "zh"  "zu"
parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
#> [1] "2015-01-01"
parse_datetime("1 enero 2015", "%d %B %Y", locale = locale("es"))
#> [1] "2015-01-01 UTC"

# Time zones
Sys.time()
#> [1] "2021-10-25 20:52:42 PDT"
Sys.timezone()
#> [1] "America/Los_Angeles"
a <- parse_datetime(as.character(Sys.time()), locale = locale(tz = "America/Los_Angeles"))
a
#> [1] "2021-10-25 20:52:42 PDT"
as.numeric(a)
#> [1] 1635220362

b <- parse_datetime(as.character(Sys.time()), locale = locale(tz = "Asia/Shanghai"))
b
#> [1] "2021-10-25 20:52:42 CST"
as.numeric(b)
#> [1] 1635166362

# Your current time zone
d <- Sys.time();
attributes(d)$tzone <- "Asia/Shanghai"
d
#> [1] "2021-10-26 11:52:42 CST"
as.numeric(d)
#> [1] 1635220362

Exercises

  1. Generate the correct format string to parse each of the following dates and times:

    d1 <- "January 1, 2010"
    d2 <- "2015-Mar-07"
    d3 <- "06-Jun-2017"
    d4 <- c("August 19 (2015)", "July 1 (2015)")
    d5 <- "12/30/14" # Dec 30, 2014
    t1 <- "1705"
    t2 <- "11:15:10.12 PM"

Dates, date-times, and times \(lubridate-datetimes\)

Prerequisites

  • This chapter will focus on the lubridate package, which makes it easier to work with dates and times in R. lubridate is not part of core tidyverse because you only need it when you’re working with dates/times. We will also need nycflights13 for practice data.

    library(tidyverse)
    library(lubridate)
    library(nycflights13)

Creating date/times

  • There are three types of date/time data that refer to an instant in time:

    • A date. Tibbles print this as <date>.

    • A time within a day. Tibbles print this as <time>.

    • A date-time is a date plus a time: it uniquely identifies an instant in time (typically to the nearest second). Tibbles print this as <dttm>. Elsewhere in R these are called POSIXct, but I don’t think that’s a very useful name. In this chapter we are only going to focus on dates and date-times as R doesn’t have a native class for storing times. If you need one, you can use the hms package.

    To get the current date or date-time you can use today() or now():

    today()
    #> [1] "2021-10-25"
    now()
    #> [1] "2021-10-25 20:52:42 PDT"

    Otherwise, there are three ways you’re likely to create a date/time:

    • From a string.
    • From individual date-time components.
    • From an existing date/time object.

From strings

  • Date/time data often comes as strings. You’ve seen one approach to parsing strings into date-times in date-times. Another approach is to use the helpers provided by lubridate. They automatically work out the format once you specify the order of the component. To use them, identify the order in which year, month, and day appear in your dates, then arrange “y”, “m”, and “d” in the same order. That gives you the name of the lubridate function that will parse your date. For example:

    ymd("2017-01-31")
    #> [1] "2017-01-31"
    mdy("January 31st, 2017")
    #> [1] "2017-01-31"
    dmy("31-Jan-2017")
    #> [1] "2017-01-31"
    
    ## m also matches b and B; y also matches Y   
    guess_formats(
      c(
        "jan 3   10",
        "Feb 20th 73",
        "January 5 1999 at 7pm",
        "January 5 1999 at pm",
        "03:23:22 pm",
        "DOB:12/11/00"
      ),
      orders = "mdy",
      print_matches = TRUE
    )
    #>                              Omdy              mdy             
    #> [1,] "jan 3   10"            "%Om %d   %y"     "%b %d   %y"    
    #> [2,] "Feb 20th 73"           "%Om %dth %y"     "%b %dth %y"    
    #> [3,] "January 5 1999 at 7pm" ""                ""              
    #> [4,] "January 5 1999 at pm"  "%Om %d %Y at pm" "%B %d %Y at pm"
    #> [5,] "03:23:22 pm"           "%Om:%d:%y pm"    "%m:%d:%y pm"   
    #> [6,] "DOB:12/11/00"          "DOB:%Om/%d/%y"   "DOB:%m/%d/%y"
    #>              Omdy              Omdy              Omdy              Omdy 
    #>     "%Om %d   %y"     "%Om %dth %y" "%Om %d %Y at pm"    "%Om:%d:%y pm" 
    #>              Omdy               mdy               mdy               mdy 
    #>   "DOB:%Om/%d/%y"      "%b %d   %y"      "%b %dth %y"  "%B %d %Y at pm" 
    #>               mdy               mdy 
    #>     "%m:%d:%y pm"    "DOB:%m/%d/%y"
    
    parse_datetime("jan 3   10", "%b %d   %y")
    #> [1] "2010-01-03 UTC"
    parse_datetime("Feb 20th 73", "%b %dth %y")
    #> [1] "1973-02-20 UTC"
    parse_datetime("January 5 1999 at 7pm", "%B %d %Y at %I%p")
    #> [1] "1999-01-05 19:00:00 UTC"
    parse_datetime("03:23:22 pm", "%m:%d:%y pm")
    #> [1] "2022-03-23 UTC"
    parse_time("03:23:22 pm", "%I:%M:%S %p")
    #> 15:23:22
    parse_datetime("DOB:12/11/00", "DOB:%m/%d/%y")
    #> [1] "2000-12-11 UTC"
    
    guess_formats("21 Aug 2011, 11:15:34 pm", "dmyHMS", print_matches = TRUE)
    #>                                 dOmyHMS                 
    #> [1,] "21 Aug 2011, 11:15:34 pm" "%d %Om %Y, %H:%M:%S pm"
    #>      dmyHMS                 
    #> [1,] "%d %b %Y, %H:%M:%S pm"
    #>                  dOmyHMS                   dmyHMS 
    #> "%d %Om %Y, %H:%M:%S pm"  "%d %b %Y, %H:%M:%S pm"
    parse_datetime("21 Aug 2011, 11:15:34 pm", "%d %b %Y, %H:%M:%S pm")
    #> [1] "2011-08-21 11:15:34 UTC"
    parse_datetime("21 Aug 2011, 11:15:34 pm", "%d %b %Y, %I:%M:%S %p")
    #> [1] "2011-08-21 23:15:34 UTC"

    These functions also take unquoted numbers. This is the most concise way to create a single date/time object, as you might need when filtering date/time data. ymd() is short and unambiguous:

    ymd(20170131)
    #> [1] "2017-01-31"
    ymd(170131)
    #> [1] "2017-01-31"

    ymd() and friends create dates. To create a date-time, add an underscore and one or more of “h”, “m”, and “s” to the name of the parsing function:

    ymd_hms("2017-01-31 20:11:59")
    #> [1] "2017-01-31 20:11:59 UTC"
    mdy_hm("01/31/2017 08:01")
    #> [1] "2017-01-31 08:01:00 UTC"

    You can also force the creation of a date-time from a date by supplying a timezone:

    ymd(20170131, tz = "UTC")
    #> [1] "2017-01-31 UTC"

From individual components

  • Instead of a single string, sometimes you’ll have the individual components of the date-time spread across multiple columns. To create a date/time from this sort of input, use make_date() for dates, or make_datetime() for date-times:

    flights %>% 
      select(year, month, day, hour, minute) %>% 
      mutate(departure = make_datetime(year, month, day, hour, minute))
    #> # A tibble: 336,776 x 6
    #>    year month   day  hour minute departure          
    #>   <int> <int> <int> <dbl>  <dbl> <dttm>             
    #> 1  2013     1     1     5     15 2013-01-01 05:15:00
    #> 2  2013     1     1     5     29 2013-01-01 05:29:00
    #> 3  2013     1     1     5     40 2013-01-01 05:40:00
    #> 4  2013     1     1     5     45 2013-01-01 05:45:00
    #> 5  2013     1     1     6      0 2013-01-01 06:00:00
    #> # … with 336,771 more rows

    Let’s do the same thing for each of the four time columns in flights. The times are represented in a slightly odd format, so we use modulus arithmetic to pull out the hour and minute components. Once I’ve created the date-time variables, I focus in on the variables we’ll explore in the rest of the chapter.

    make_datetime_100 <- function(year, month, day, time) {
      make_datetime(year, month, day, time %/% 100, time %% 100)
    }
    
    flights_dt <- flights %>% 
      filter(!is.na(dep_time), !is.na(arr_time)) %>% 
      mutate(
        dep_time = make_datetime_100(year, month, day, dep_time),
        arr_time = make_datetime_100(year, month, day, arr_time),
        sched_dep_time = make_datetime_100(year, month, day, sched_dep_time),
        sched_arr_time = make_datetime_100(year, month, day, sched_arr_time)
      ) %>% 
      select(origin, dest, ends_with("delay"), ends_with("time"))
    
    flights_dt
    #> # A tibble: 328,063 x 9
    #>   origin dest  dep_delay arr_delay dep_time            sched_dep_time     
    #>   <chr>  <chr>     <dbl>     <dbl> <dttm>              <dttm>             
    #> 1 EWR    IAH           2        11 2013-01-01 05:17:00 2013-01-01 05:15:00
    #> 2 LGA    IAH           4        20 2013-01-01 05:33:00 2013-01-01 05:29:00
    #> 3 JFK    MIA           2        33 2013-01-01 05:42:00 2013-01-01 05:40:00
    #> 4 JFK    BQN          -1       -18 2013-01-01 05:44:00 2013-01-01 05:45:00
    #> 5 LGA    ATL          -6       -25 2013-01-01 05:54:00 2013-01-01 06:00:00
    #> # … with 328,058 more rows, and 3 more variables: arr_time <dttm>,
    #> #   sched_arr_time <dttm>, air_time <dbl>

From other types

  • You may want to switch between a date-time and a date. That’s the job of as_datetime() and as_date():

    as_datetime(today())
    #> [1] "2021-10-25 UTC"
    as_date(now())
    #> [1] "2021-10-25"

    Sometimes you’ll get date/times as numeric offsets from the “Unix Epoch”, 1970-01-01. If the offset is in seconds, use as_datetime(); if it’s in days, use as_date().

    as_datetime(60 * 60 * 10)
    #> [1] "1970-01-01 10:00:00 UTC"
    as_date(365 * 10 + 2)
    #> [1] "1980-01-01"

Date-time components

  • Now that you know how to get date-time data into R’s date-time data structures, let’s explore what you can do with them. This section will focus on the accessor functions that let you get and set individual components. The next section will look at how arithmetic works with date-times.

Getting components

  • You can pull out individual parts of the date with the accessor functions year(), month(), mday() (day of the month), yday() (day of the year), wday() (day of the week), hour(), minute(), and second().

    datetime <- ymd_hms("2016-07-08 12:34:56")
    
    year(datetime)
    #> [1] 2016
    month(datetime)
    #> [1] 7
    mday(datetime)
    #> [1] 8
    
    yday(datetime)
    #> [1] 190
    wday(datetime)
    #> [1] 6

    For month() and wday() you can set label = TRUE to return the abbreviated name of the month or day of the week. Set abbr = FALSE to return the full name.

    month(datetime, label = TRUE)
    #> [1] Jul
    #> 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
    wday(datetime, label = TRUE, abbr = FALSE)
    #> [1] Friday
    #> 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Rounding

  • An alternative approach to plotting individual components is to round the date to a nearby unit of time, with floor_date(), round_date(), and ceiling_date(). Each function takes a vector of dates to adjust and then the name of the unit round down (floor), round up (ceiling), or round to.

    Sys.time()
    #> [1] "2021-10-25 20:52:44 PDT"
    round_date(Sys.time(), '2 hours')
    #> [1] "2021-10-25 20:00:00 PDT"
    ceiling_date(Sys.time(), '2 hours')
    #> [1] "2021-10-25 22:00:00 PDT"
    floor_date(Sys.time(), '2 hours')
    #> [1] "2021-10-25 20:00:00 PDT"

Setting components

  • You can also use each accessor function to set the components of a date/time:

    datetime <- ymd_hms("2016-07-08 12:34:56")
    
    year(datetime) <- 2020
    datetime
    #> [1] "2020-07-08 12:34:56 UTC"
    month(datetime) <- 01
    datetime
    #> [1] "2020-01-08 12:34:56 UTC"
    hour(datetime) <- hour(datetime) + 1
    datetime
    #> [1] "2020-01-08 13:34:56 UTC"

    Alternatively, rather than modifying in place, you can create a new date-time with update(). This also allows you to set multiple values at once.

    update(datetime, year = 2020, month = 2, mday = 2, hour = 2)
    #> [1] "2020-02-02 02:34:56 UTC"

    If values are too big, they will roll-over:

    ymd("2015-02-01") %>% update(mday = 30)
    #> [1] "2015-03-02"
    ymd("2015-02-01") %>% update(hour = 400)
    #> [1] "2015-02-17 16:00:00 UTC"

Time spans

  • Next you’ll learn about how arithmetic with dates works, including subtraction, addition, and division. Along the way, you’ll learn about three important classes that represent time spans:

    • durations, which represent an exact number of seconds.
    • periods, which represent human units like weeks and months.
    • intervals, which represent a starting and ending point.

Durations

  • In R, when you subtract two dates, you get a difftime object:

    h_age <- today() - ymd(19791014)
    h_age
    #> Time difference of 15352 days

    A difftime class object records a time span of seconds, minutes, hours, days, or weeks. This ambiguity can make difftimes a little painful to work with, so lubridate provides an alternative which always uses seconds: the duration.

    as.duration(h_age)
    #> [1] "1326412800s (~42.03 years)"

    Durations come with a bunch of convenient constructors:

    dseconds(15)
    #> [1] "15s"
    dminutes(10)
    #> [1] "600s (~10 minutes)"
    dhours(c(12, 24))
    #> [1] "43200s (~12 hours)" "86400s (~1 days)"
    ddays(0:5)
    #> [1] "0s"                "86400s (~1 days)"  "172800s (~2 days)"
    #> [4] "259200s (~3 days)" "345600s (~4 days)" "432000s (~5 days)"
    dweeks(3)
    #> [1] "1814400s (~3 weeks)"
    dyears(1)
    #> [1] "31557600s (~1 years)"

    Durations always record the time span in seconds. Larger units are created by converting minutes, hours, days, weeks, and years to seconds at the standard rate (60 seconds in a minute, 60 minutes in an hour, 24 hours in day, 7 days in a week, 365 days in a year).

    You can add and multiply durations:

    2 * dyears(1)
    #> [1] "63115200s (~2 years)"
    dyears(1) + dweeks(12) + dhours(15)
    #> [1] "38869200s (~1.23 years)"

    You can add and subtract durations to and from days:

    tomorrow <- today() + ddays(1)
    last_year <- today() - dyears(1)

    However, because durations represent an exact number of seconds, sometimes you might get an unexpected result:

    one_pm <- ymd_hms("2016-03-12 13:00:00", tz = "America/New_York")
    
    one_pm
    #> [1] "2016-03-12 13:00:00 EST"
    one_pm + ddays(1)
    #> [1] "2016-03-13 14:00:00 EDT"
    
    one_pm <- ymd_hms("2016-03-12 13:00:00", tz = "Asia/Shanghai")
    
    one_pm
    #> [1] "2016-03-12 13:00:00 CST"
    one_pm + ddays(1)
    #> [1] "2016-03-13 13:00:00 CST"

Periods

  • To solve this problem, lubridate provides periods. Periods are time spans but don’t have a fixed length in seconds, instead they work with “human” times, like days and months. That allows them work in a more intuitive way:

    one_pm
    #> [1] "2016-03-12 13:00:00 CST"
    one_pm + days(1)
    #> [1] "2016-03-13 13:00:00 CST"

    Like durations, periods can be created with a number of friendly constructor functions.

    seconds(15)
    #> [1] "15S"
    minutes(10)
    #> [1] "10M 0S"
    hours(c(12, 24))
    #> [1] "12H 0M 0S" "24H 0M 0S"
    days(7)
    #> [1] "7d 0H 0M 0S"
    months(1:6)
    #> [1] "1m 0d 0H 0M 0S" "2m 0d 0H 0M 0S" "3m 0d 0H 0M 0S" "4m 0d 0H 0M 0S"
    #> [5] "5m 0d 0H 0M 0S" "6m 0d 0H 0M 0S"
    weeks(3)
    #> [1] "21d 0H 0M 0S"
    years(1)
    #> [1] "1y 0m 0d 0H 0M 0S"

    You can add and multiply periods:

    10 * (months(6) + days(1))
    #> [1] "60m 10d 0H 0M 0S"
    days(50) + hours(25) + minutes(2)
    #> [1] "50d 25H 2M 0S"

    And of course, add them to dates. Compared to durations, periods are more likely to do what you expect:

    # A leap year
    leap_year(2020)
    #> [1] TRUE
    
    ymd("2020-01-01") + dyears(1)
    #> [1] "2020-12-31 06:00:00 UTC"
    ymd("2020-01-01") + years(1)
    #> [1] "2021-01-01"
    ymd("2020-01-01") + weeks(0:5)
    #> [1] "2020-01-01" "2020-01-08" "2020-01-15" "2020-01-22" "2020-01-29"
    #> [6] "2020-02-05"
    
    # The last date of the month    
    ymd("2020-01-31") + months(0:11)
    #>  [1] "2020-01-31" NA           "2020-03-31" NA           "2020-05-31"
    #>  [6] NA           "2020-07-31" "2020-08-31" NA           "2020-10-31"
    #> [11] NA           "2020-12-31"
    ymd("2020-01-31") %m+% months(0:11)
    #>  [1] "2020-01-31" "2020-02-29" "2020-03-31" "2020-04-30" "2020-05-31"
    #>  [6] "2020-06-30" "2020-07-31" "2020-08-31" "2020-09-30" "2020-10-31"
    #> [11] "2020-11-30" "2020-12-31"
    
    # Daylight Savings Time
    one_pm + dmonths(1)  #datetime
    #> [1] "2016-04-11 23:30:00 CST"
    one_pm + months(1)   #date
    #> [1] "2016-04-12 13:00:00 CST"

    Let’s use periods to fix an oddity related to our flight dates. Some planes appear to have arrived at their destination before they departed from New York City.

    flights_dt %>% 
      filter(arr_time < dep_time) 
    #> # A tibble: 10,633 x 9
    #>   origin dest  dep_delay arr_delay dep_time            sched_dep_time     
    #>   <chr>  <chr>     <dbl>     <dbl> <dttm>              <dttm>             
    #> 1 EWR    BQN           9        -4 2013-01-01 19:29:00 2013-01-01 19:20:00
    #> 2 JFK    DFW          59        NA 2013-01-01 19:39:00 2013-01-01 18:40:00
    #> 3 EWR    TPA          -2         9 2013-01-01 20:58:00 2013-01-01 21:00:00
    #> 4 EWR    SJU          -6       -12 2013-01-01 21:02:00 2013-01-01 21:08:00
    #> 5 EWR    SFO          11       -14 2013-01-01 21:08:00 2013-01-01 20:57:00
    #> # … with 10,628 more rows, and 3 more variables: arr_time <dttm>,
    #> #   sched_arr_time <dttm>, air_time <dbl>

    These are overnight flights. We used the same date information for both the departure and the arrival times, but these flights arrived on the following day. We can fix this by adding days(1) to the arrival time of each overnight flight.

    flights_dt <- flights_dt %>% 
      mutate(
        overnight = arr_time < dep_time,
        arr_time = arr_time + days(overnight * 1),
        sched_arr_time = sched_arr_time + days(overnight * 1)
      )

    Now all of our flights obey the laws of physics.

    flights_dt %>% 
      filter(overnight, arr_time < dep_time) 
    #> # A tibble: 0 x 10
    #> # … with 10 variables: origin <chr>, dest <chr>, dep_delay <dbl>,
    #> #   arr_delay <dbl>, dep_time <dttm>, sched_dep_time <dttm>, arr_time <dttm>,
    #> #   sched_arr_time <dttm>, air_time <dbl>, overnight <lgl>

Intervals

  • If you want a more accurate measurement, you’ll have to use an interval. An interval is a duration with a starting point: that makes it precise so you = an determine exactly how long it is:

    dyears(1) / ddays(1)
    #> [1] 365.25
    years(1) / days(1)
    #> [1] 365.25
    date <- ymd("2020-01-01")
    next_year <- today() + years(1)
    (today() %--% next_year) / ddays(1)
    #> [1] 365
    (today() %--% next_year) %/% days(1)
    #> [1] 365
    
    ymd('2020-02-29') + dyears(1)
    #> [1] "2021-02-28 06:00:00 UTC"
    ymd('2020-02-29') + years(1)
    #> [1] NA
    ymd('2020-02-29') %m+% years(1)
    #> [1] "2021-02-28"
    #overlap
    
    a1 <- ymd(c('2021-01-01', '2021-01-02', '2021-03-02'))
    a2 <- ymd(c('2021-02-01', '2021-02-02', '2021-03-02'))
    b1 <- ymd(c('2021-01-02', '2021-03-02', '2021-03-02'))
    b2 <- ymd(c('2021-02-02', '2021-03-02', '2021-04-02'))
    
    
    int_overlaps(interval(a1, a2), interval(b1, b2));
    #> [1]  TRUE FALSE  TRUE
    age <- function(dob){
      floor((today() - ymd(dob)) / dyears(1))
    }
    age(c("1997-1-1", "2000-12-31"))
    #> [1] 24 20

Time zones

  • You can find out what R thinks your current time zone is with Sys.timezone():

    Sys.timezone()
    #> [1] "America/Los_Angeles"

    (If R doesn’t know, you’ll get an NA.)

    And see the complete list of all time zone names with OlsonNames():

    length(OlsonNames())
    #> [1] 593
    head(OlsonNames())
    #> [1] "Africa/Abidjan"     "Africa/Accra"       "Africa/Addis_Ababa"
    #> [4] "Africa/Algiers"     "Africa/Asmara"      "Africa/Asmera"

    In R, the time zone is an attribute of the date-time that only controls printing. For example, these three objects represent the same instant in time:

    x1 <- ymd_hms("2015-06-01 12:01:00", tz = "America/New_York")
    x2 <- ymd_hms("2015-06-01 18:01:00", tz = "Europe/Paris")
    x3 <- ymd_hms("2015-06-02 01:01:00", tz = "Asia/Tokyo")
    x4 <- ymd_hms("2015-06-02 00:01:00", tz = "Asia/Shanghai")

    You can verify that they’re the same time using subtraction:

    x1 - x2
    #> Time difference of 0 secs
    x1 - x3
    #> Time difference of 0 secs
    x1 - x4
    #> Time difference of 0 secs

    Unless otherwise specified, lubridate always uses UTC. UTC (Coordinated Universal Time) is the standard time zone used by the scientific community and roughly equivalent to its predecessor GMT (Greenwich Mean Time). It does not have DST, which makes a convenient representation for computation. Operations that combine date-times, like c(), will often drop the time zone. In that case, the date-times will display in your local time zone:

    x4 <- c(x1, x2, x3)
    x4
    #> [1] "2015-06-01 12:01:00 EDT" "2015-06-01 12:01:00 EDT"
    #> [3] "2015-06-01 12:01:00 EDT"

    You can change the time zone in two ways:

    • Keep the instant in time the same, and change how it’s displayed. Use this when the instant is correct, but you want a more natural display.

      x4a <- with_tz(x4, tzone = "Asia/Shanghai")
      x4a
      #> [1] "2015-06-02 00:01:00 CST" "2015-06-02 00:01:00 CST"
      #> [3] "2015-06-02 00:01:00 CST"
      x4a - x4
      #> Time differences in secs
      #> [1] 0 0 0

      (This also illustrates another challenge of times zones: they’re not all integer hour offsets!)

    • Change the underlying instant in time. Use this when you have an instant that has been labelled with the incorrect time zone, and you need to fix it.

      x4b <- force_tz(x4, tzone = "Asia/Shanghai")
      x4b
      #> [1] "2015-06-01 12:01:00 CST" "2015-06-01 12:01:00 CST"
      #> [3] "2015-06-01 12:01:00 CST"
      x4b - x4
      #> Time differences in hours
      #> [1] -12 -12 -12

Dates, date \(stringr-date\)

a <- c(
  "09 Sep 2018 12:00",
  "19 Sep 2018",
  "9 unk 2018",
  "un Sep 2019",
  "un-Sep-2019",
  "un unk 2019",
  "un-unk-19 12:00",
  "un-unk-2019",
  "un/unk/99",
  "un/unk/1999"
)

make_date_10 <- function(indate) {
  outdate <- indate %>% 
    str_to_title() %>% 
    str_extract("\\w{1,2}\\W*\\D{3}\\W*\\d{2,4}") %>% 
    str_remove_all(pattern = "\\W") %>% 
    str_replace(., "^(\\d{1}\\D)(\\w+)", "0\\1\\2") %>% 
    str_replace(., "(\\w{2})(\\w{3})(\\d{2,4})", "\\3\\2\\1") %>% 
    if_else(str_length(.) == 7,
      (if_else(
        str_replace(., "^(\\d{2})(\\w+)", "20\\1") > str_trim(as.character(year(today()))),
        str_replace(., "^(\\d{2})(\\w+)", "19\\1\\2"),
        str_replace(., "^(\\d{2})(\\w+)", "20\\1\\2")
      )),
      .
    ) %>% 
    str_replace_all(c(
      "Jan" = "-01-",
      "Feb" = "-02-",
      "Mar" = "-03-",
      "Apr" = "-04-",
      "May" = "-05-",
      "Jun" = "-06-",
      "Jul" = "-07-",
      "Aug" = "-08-",
      "Sep" = "-09-",
      "Oct" = "-10-",
      "Nov" = "-11-",
      "Dec" = "-12-",
      "UnkUn" = "",
      "Unk" = "-UN-",
      "-Un" = ""
    ))
  
  return(outdate)
}

b <- make_date_10(a)
b
#>  [1] "2018-09-09" "2018-09-19" "2018-UN-09" "2019-09"    "2019-09"   
#>  [6] "2019"       "2019"       "2019"       "1999"       "1999"