|
| 1 | +--- |
| 2 | +title: "Dates and times" |
| 3 | +output: rmarkdown::html_vignette |
| 4 | +vignette: > |
| 5 | + %\VignetteIndexEntry{Dates and times} |
| 6 | + %\VignetteEngine{knitr::rmarkdown} |
| 7 | + %\VignetteEncoding{UTF-8} |
| 8 | +--- |
| 9 | + |
| 10 | +<!-- dates-and-times.Rmd is generated from dates-and-times.Rmd.orig.Rmd. Please edit that file --> |
| 11 | + |
| 12 | +```{r setup, include = FALSE} |
| 13 | +can_decrypt <- gargle:::secret_can_decrypt("googlesheets4") |
| 14 | +knitr::opts_chunk$set( |
| 15 | + collapse = TRUE, |
| 16 | + comment = "#>", |
| 17 | + error = TRUE, |
| 18 | + purl = can_decrypt, |
| 19 | + eval = can_decrypt |
| 20 | +) |
| 21 | +``` |
| 22 | + |
| 23 | +```{r eval = !can_decrypt, echo = FALSE, comment = NA} |
| 24 | +message("No token available. Code chunks will not be evaluated.") |
| 25 | +``` |
| 26 | + |
| 27 | +This article provides advice on reading and writing datetimes with Google Sheets, specifically around the matter of time zones. |
| 28 | + |
| 29 | +A related issue is how datetimes are formatted for presentation in the Sheet itself. You can read more about these formats in the [Date and time format patterns section of the Sheets API docs](https://developers.google.com/sheets/api/guides/formats#date_and_time_format_patterns). At the time of writing, googlesheets4 provides no user-friendly way to address these formats, although it may do so in the future. |
| 30 | + |
| 31 | +## Attach packages and do auth |
| 32 | + |
| 33 | +Attach googlesheets4. |
| 34 | + |
| 35 | +```{r} |
| 36 | +library(googlesheets4) |
| 37 | +``` |
| 38 | + |
| 39 | +Since we eventually create and edit Sheets, we also auth here in a hidden chunk. If you run this code, you should expect auth to happen. |
| 40 | + |
| 41 | +```{r include = FALSE} |
| 42 | +# happens in .onLoad() when IN_PKGDOWN, but need this for local, intentional |
| 43 | +# precompilation |
| 44 | +googlesheets4:::gs4_auth_docs(drive = TRUE) |
| 45 | +``` |
| 46 | + |
| 47 | +The lubridate package ([lubridate.tidyverse.org](https://lubridate.tidyverse.org)) is useful for this exploration, so we attach it now. |
| 48 | + |
| 49 | +```{r} |
| 50 | +library(lubridate, warn.conflicts = FALSE) |
| 51 | +``` |
| 52 | + |
| 53 | +## How to work with time zones in a Google Sheet |
| 54 | + |
| 55 | +You don't. |
| 56 | + |
| 57 | +Literally, you can't. |
| 58 | + |
| 59 | +I know this sounds very harsh, but it is the truth. Google Sheets offer essentially no support for time zones and your life will be simpler if you just make peace with this and accept that you will be looking at UTC times in Sheets. |
| 60 | + |
| 61 | +A short demo: in R, capture the current time as `tt` and reveal the current time zone. Note that `tt` is displayed in R according to this time zone. |
| 62 | + |
| 63 | +```{r} |
| 64 | +(tt <- Sys.time()) |
| 65 | +
|
| 66 | +Sys.timezone() |
| 67 | +``` |
| 68 | + |
| 69 | +Write the `tt` datetime to a Sheet, configured with the same time zone as the local R session, and create another cell that captures the exact text presented in the browser UI for `tt`. Read this back into R. |
| 70 | + |
| 71 | +```{r} |
| 72 | +dat <- tibble::tibble( |
| 73 | + datetime = tt, |
| 74 | + as_displayed = gs4_formula("=TO_TEXT(A2)") |
| 75 | +) |
| 76 | +
|
| 77 | +(ss <- gs4_create( |
| 78 | + "no-time-zone-effect", |
| 79 | + sheets = dat, |
| 80 | + timeZone = Sys.timezone() |
| 81 | +)) |
| 82 | +
|
| 83 | +read_sheet(ss) |
| 84 | +``` |
| 85 | + |
| 86 | +Note that the `tt` datetime is displayed differently in Sheets than it is locally in R. Sheets presents datetimes in Coordinated Universal Time (time zone Etc/UTC), even if the Sheet's metadata specifies a different time zone, such as America/Vancouver. |
| 87 | + |
| 88 | +```{r} |
| 89 | +with_tz(tt, "Etc/UTC") |
| 90 | +``` |
| 91 | + |
| 92 | +If you want to understand more about datetimes in R, in Sheets, and how you can sort of hack around this time zone problem, keep reading. |
| 93 | + |
| 94 | +```{r include = FALSE} |
| 95 | +gs4_find("no-time-zone-effect") %>% googledrive::drive_trash() |
| 96 | +``` |
| 97 | + |
| 98 | +## Need-to-know basics of datetimes |
| 99 | + |
| 100 | +Datetimes are a complicated topic. Here we dramatically oversimplify things, in the name of making a reader who is new to all of this at least minimally functional. |
| 101 | + |
| 102 | +The main system used to represent times in the computing world is Unix epoch time: |
| 103 | + |
| 104 | +> A moment in time is represented by the number of seconds that have elapsed since 1 January 1970 00:00:00 UTC. |
| 105 | +
|
| 106 | +The "UTC" part stands for "Coordinated Universal Time". Yes, the order of the letters is strangely different from the words! It's a great metaphor for this entire subject, because nothing is as simple as you'd like. Just accept it and move on. UTC is what you may already think of as "Greenwich Mean Time", if you've ever encountered that term. |
| 107 | + |
| 108 | +There are three wrinkles we must acknowledge, even when oversimplifying: |
| 109 | + |
| 110 | +1. Time zones. We don't all live in Greenwich, England, so local times are |
| 111 | + described by an offset from UTC. |
| 112 | + - I live in Vancouver, British Columbia, which is 8 hours behind UTC |
| 113 | + (UTC−08:00). Or, at least, it is part of the year ... |
| 114 | +2. Daylight savings time. Lots of places change their clocks twice a year. UTC |
| 115 | + does not! UTC "just is". This means that your UTC offset is one number part |
| 116 | + of the year and another number part of the year. |
| 117 | + - My offset is -08:00 during standard time, but is -07:00 during daylight |
| 118 | + savings (roughly March - October). |
| 119 | +3. Spreadsheets are special. Spreadsheets use a different form of epoch time. |
| 120 | + Their epoch or origin is usually around 1900 and they keep track of how many |
| 121 | + *days* have elapsed since the epoch (not seconds, like Unix epoch time). |
| 122 | + These are sometimes called "serial dates" or "serial numbers". |
| 123 | + - Google Sheets use an epoch of 30 December 1899 00:00:00 UTC. |
| 124 | + - Horrors I will spare you: different epochs for different spreadsheet |
| 125 | + applications (or versions thereof) and the Lotus 1-2-3 leap year bug. |
| 126 | + |
| 127 | +You can read more in the [Sheets API docs about Date/Time serial numbers](https://developers.google.com/sheets/api/guides/concepts#datetime_serial_numbers). |
| 128 | + |
| 129 | +## Datetimes in R |
| 130 | + |
| 131 | +R uses Unix epoch time. |
| 132 | + |
| 133 | +R uses the [POSIXct](https://rdrr.io/r/base/DateTimeClasses.html) class to represent datetimes. (Yes, there's also `POSIXlt`, but I recommend and will focus on `POSIXct`.) |
| 134 | + |
| 135 | +If you ask for the current time, R prints it formatted for your time zone (or, at least, it tries). You can also ask R to reveal what it thinks your time zone is. |
| 136 | + |
| 137 | +```{r} |
| 138 | +Sys.time() |
| 139 | +
|
| 140 | +Sys.timezone() |
| 141 | +``` |
| 142 | + |
| 143 | +The time zone is **purely matter of display**, but it's a really nice touch! It is comforting to get a time printed by R that matches your experience of what time it is, based on looking the clock on your wall ("clock time"). |
| 144 | + |
| 145 | +lubridate's `with_tz()` function let's you explicitly associate a datetime with a time zone, e.g. your own or any other time zone recognized by your system. And this, in turn, affects how the time is formatted for human eyeballs. |
| 146 | + |
| 147 | +```{r} |
| 148 | +tt <- Sys.time() |
| 149 | +
|
| 150 | +with_tz(tt, tzone = "America/Vancouver") |
| 151 | +
|
| 152 | +with_tz(tt, tzone = "America/Denver") |
| 153 | +
|
| 154 | +with_tz(tt, tzone = "Etc/UTC") |
| 155 | +``` |
| 156 | + |
| 157 | +**This ability to display a moment in time according to a specified time zone is not present in Sheets.** Yes, each Sheet has an associated time zone, but it does not have this effect, even though you might expect or hope for that. |
| 158 | + |
| 159 | +When we read datetimes out of a Google Sheet, we must: |
| 160 | + |
| 161 | + * Convert from days to seconds. |
| 162 | + * Adjust for Unix epoch versus spreadsheet epoch. |
| 163 | + * Use the resulting number to construct an instance of `POSIXct`. |
| 164 | + |
| 165 | +When we write a datetime to a Google Sheet, we must: |
| 166 | + |
| 167 | + * Convert from seconds to days. |
| 168 | + * Adjust for the spreadsheet epoch versus Unix epoch. |
| 169 | + * Use the resulting number to create an instance of the |
| 170 | + [`CellData`](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#celldata) |
| 171 | + schema. |
| 172 | + |
| 173 | +## Datetimes in Google Sheets |
| 174 | + |
| 175 | +Google Sheets use a spreadsheet-y version of epoch time. A datetime cell holds a so-called serial number, which is the elapsed days since the epoch of 30 December 1899 00:00:00 UTC. This number is then displayed in a more human-friendly way, according to a special token string. Currently googlesheets4 doesn't offer any explicit support for dealing with these [format strings](https://developers.google.com/sheets/api/guides/formats#date_and_time_format_patterns), although one day it probably will. |
| 176 | + |
| 177 | +Let's gain some intuition by looking at datetimes shortly after the epoch and inspecting the underlying serial numbers. In a hidden chunk, we create a Sheet and read it into R. |
| 178 | + |
| 179 | +```{r include = FALSE} |
| 180 | +# The constructed formulas are very special because of a very weird thing! |
| 181 | +# The =DATE() function can only be used with dates on or after 1 Jan 1900, |
| 182 | +# even though the Sheets epoch is 30 Dec 1899. |
| 183 | +# https://support.google.com/docs/answer/3092969?hl=en&ref_topic=3105385 |
| 184 | +# "Google Sheets uses the 1900 date system. The first date is 1/1/1900. |
| 185 | +# Between 0 and 1899, Google Sheets adds that value to 1900 to calculate |
| 186 | +# the year. For example, DATE(119,2,1) will create a date of 2/1/2019." |
| 187 | +# Therefore, to construct a moment in 1899, I choose to construct the date part |
| 188 | +# from parsing a string, then I can use =TIME() to apply a time to that. |
| 189 | +``` |
| 190 | + |
| 191 | +```{r echo = FALSE} |
| 192 | +dat <- tibble::tibble( |
| 193 | + datetime = gs4_formula(c( |
| 194 | + '=(TO_DATE(DATEVALUE("1899-12-30"))+TIME(12,0,0))', |
| 195 | + '=(TO_DATE(DATEVALUE("1899-12-31"))+TIME(18,0,0))' |
| 196 | + )), |
| 197 | + serial_number = gs4_formula(c("=ARRAYFORMULA(TO_PURE_NUMBER(A2:A))", NA)) |
| 198 | +) |
| 199 | +ss <- gs4_create("near-the-epoch", sheets = dat) |
| 200 | +read_sheet(ss) |
| 201 | +``` |
| 202 | + |
| 203 | +1899-12-30 12:00:00 is noon on the day that is the Google Sheets epoch. Its underlying serial number is 0.5, because one half-day has elapsed since the epoch. 1899-12-31 18:00:00 is 6pm in the evening on the day after the epoch. Its underlying serial number is 1.75, because it's one plus three-quarters of a day since the epoch. |
| 204 | + |
| 205 | +```{r include = FALSE} |
| 206 | +gs4_find("near-the-epoch") %>% googledrive::drive_trash() |
| 207 | +``` |
| 208 | + |
| 209 | +Every Google Sheet has an associated time zone. It is included in the metadata returned by `gs4_get()`, like the locale, and is revealed by default when we print a Sheets ID. |
| 210 | + |
| 211 | +```{r} |
| 212 | +(meta <- gs4_example("gapminder") %>% |
| 213 | + gs4_get()) |
| 214 | +
|
| 215 | +meta$time_zone |
| 216 | +``` |
| 217 | + |
| 218 | +However, this time zone has a very different impact -- much *less* impact -- on the user experience than the time zone in R. |
| 219 | + |
| 220 | +The Sheets' time zone does **not** influence the display of datetimes. There is no way to request that a datetime be displayed according to a specific time zone -- not via the Sheet's time zone, not via the format string, and not via a Sheets function. |
| 221 | + |
| 222 | +Datetimes in Google Sheets are fundamentally UTC-based and always display as such. |
| 223 | + |
| 224 | +If you want to see "9:14 am" in your Sheet, you must make sure the serial number in that cell represents 9:14 in the morning, UTC time. |
| 225 | + |
| 226 | +As far as I can tell, here is the only effect of a Sheet's time zone: The formulas `=NOW()` and `=TODAY()` take the local clock time or date, according to the the Sheet's time zone, and construct the UTC moment or date that will display as that time or date. Therefore `=NOW()`, especially, is almost misleading! It does not capture the current moment, in UTC, but instead fabricates a UTC moment that matches current local clock time. |
| 227 | + |
| 228 | +This suggests various hacks if you truly, deeply want to see specific clock times in your Sheet, for non-UTC time zones. |
| 229 | + |
| 230 | +Starting with the UTC moments, you must determine and apply the offset yourself. At a very crude level, this can be done from first principles with datetime arithmetic in the Sheet ("Vancouver is −08:00, so subtract 8 hours"). But then there's daylight savings time and other complexities ("Except, during DST, subtract 7 hours."). In reality, no mere mortal will ever get this right, in general. |
| 231 | + |
| 232 | +You need to use external, authoritative offset information, either within R or in the Sheet. Below, we show how to do this in R. In Sheets, people tend to use Google Apps script and solutions based on [moment.js](https://momentjs.com). |
| 233 | + |
| 234 | +## Worked example |
| 235 | + |
| 236 | +Let's make all of this concrete. We construct a data frame in R with a datetime and various versions of it that explore time zone issues. We also include a couple of Google Sheet formulas, to trigger some datetime work once the data is written into a Sheet. We sketch the construction of this data frame here, with considerable abuse of notation (mixing R code and Sheets formulas): |
| 237 | + |
| 238 | +| what | datetime | serial_number | |
| 239 | +|---------------------|-------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------| |
| 240 | +| moment | `Sys.time()` | `=ARRAYFORMULA(TO_PURE_NUMBER(B2:B))` | |
| 241 | +| moment_ny | `with_tz(moment, tzone = "America/New_York")` | | |
| 242 | +| moment_utc | `with_tz(moment, tzone = "Etc/UTC")` | | |
| 243 | +| moment_ny_force_utc | `force_tz(moment_ny, tzone = "Etc/UTC")` | | |
| 244 | +| =NOW() | `=NOW()` | | |
| 245 | +| =(DATE(moment_utc) + TIME(moment_utc)) | `=(DATE({year},{month},{day})+time({hour},{minute},{round(second, 1)}))` where `year`, `month`, etc. are computed from `moment_utc` | | |
| 246 | + |
| 247 | +Capture the current `moment` in time, with `Sys.time()`, which has no explicit time zone. Store versions of `moment` with explicit time zones: America/New_York and Etc/UTC. Use `lubridate::force_tz()` to create a new moment in time: the moment in UTC that has the same clock time as the original `moment` in New York. |
| 248 | + |
| 249 | +The first Sheets formula we use is `=NOW()`, which you might expect to be the equivalent of R's `Sys.time()`. But it's more like `force_tz(Sys.time(), tzone = "Etc/UTC")`. The second formula we construct is more elaborate. It uses datetime functions in Sheets to explicitly construct `moment_utc` in the Sheet. The last column uses `=TO_PURE_NUMBER()` to reveal the underlying serial numbers for all of the datetimes. |
| 250 | + |
| 251 | +```{r define-populate-sheets, include = FALSE} |
| 252 | +populate_sheets <- function(moment, ss) { |
| 253 | + force(moment) |
| 254 | + moment_ny <- with_tz(moment, tzone = "America/New_York") |
| 255 | + moment_utc <- with_tz(moment, tzone = "Etc/UTC") |
| 256 | + moment_ny_utc_force <- force_tz(moment_ny, tzone = "Etc/UTC") |
| 257 | +
|
| 258 | + # https://github.com/tidyverse/lubridate/issues/882 |
| 259 | + # use moment_utc, so I know for sure that time zone is UTC |
| 260 | + fixed_formula <- gs4_formula(glue::glue( |
| 261 | + "=(DATE({year},{month},{day})+time({hour},{minute},{round(second, 1)}))", |
| 262 | + year = year(moment_utc), month = month(moment_utc), day = day(moment_utc), |
| 263 | + hour = hour(moment_utc), minute = minute(moment_utc), |
| 264 | + second = second(moment_utc) |
| 265 | + )) |
| 266 | + now_formula <- gs4_formula("=NOW()") |
| 267 | + serial_number <- "=ARRAYFORMULA(TO_PURE_NUMBER(B2:B))" |
| 268 | + length(serial_number) <- 6 |
| 269 | +
|
| 270 | + dat <- tibble::tibble( |
| 271 | + what = c( |
| 272 | + "moment", "moment_ny", "moment_utc", "moment_ny_utc_force", |
| 273 | + "=NOW()", "=DATE(moment_utc)" |
| 274 | + ), |
| 275 | + datetime = list( |
| 276 | + moment, moment_ny, moment_utc, moment_ny_utc_force, |
| 277 | + now_formula, fixed_formula |
| 278 | + ), |
| 279 | + serial_number = gs4_formula(serial_number) |
| 280 | + ) |
| 281 | + |
| 282 | + range_iso <- function(ss, range) { |
| 283 | + fmt <- googlesheets4:::CellData( |
| 284 | + userEnteredFormat = googlesheets4:::new( |
| 285 | + "CellFormat", |
| 286 | + numberFormat = list(type = "DATE_TIME", pattern = "yyyy-mm-dd hh:mm:ss") |
| 287 | + ) |
| 288 | + ) |
| 289 | + range_flood(ss, sheet = 1, range = range, cell = fmt) |
| 290 | + } |
| 291 | +
|
| 292 | + for (i in seq_along(ss)) { |
| 293 | + sheet_write(dat, ss = ss[i], sheet = 1) |
| 294 | + range_autofit(ss[i], sheet = 1) |
| 295 | + range_iso(ss[i], range = "B2:B") |
| 296 | + } |
| 297 | + dat |
| 298 | +} |
| 299 | +``` |
| 300 | + |
| 301 | +Create 3 Sheets, with different approaches to the time zone: |
| 302 | + |
| 303 | +* No explicit specification of time zone. It's hard to say what you'll get here! |
| 304 | +* America/New_York |
| 305 | +* Etc/UTC |
| 306 | + |
| 307 | +```{r} |
| 308 | +ss_xx <- gs4_create("tz-default") |
| 309 | +ss_ny <- gs4_create("tz-america-new-york", timeZone = "America/New_York") |
| 310 | +ss_utc <- gs4_create("tz-etc-utc", timeZone = "Etc/UTC") |
| 311 | +
|
| 312 | +show_timezone <- function(ss) gs4_get(ss)$time_zone |
| 313 | +
|
| 314 | +show_timezone(ss_xx) |
| 315 | +show_timezone(ss_ny) |
| 316 | +show_timezone(ss_utc) |
| 317 | +``` |
| 318 | + |
| 319 | +```{r eval = FALSE, include = FALSE} |
| 320 | +# useful when developing and the sheets already exist |
| 321 | +(ss_xx <- gs4_find("tz-default") %>% as_sheets_id()) |
| 322 | +(ss_ny <- gs4_find("tz-america-new-york") %>% as_sheets_id()) |
| 323 | +(ss_utc <- gs4_find("tz-etc-utc") %>% as_sheets_id()) |
| 324 | +``` |
| 325 | + |
| 326 | +```{r eval = FALSE, include = FALSE} |
| 327 | +# useful during development, in order to see these sheets in the browser |
| 328 | +gs4_share(ss_xx) |
| 329 | +gs4_share(ss_ny) |
| 330 | +gs4_share(ss_utc) |
| 331 | +``` |
| 332 | + |
| 333 | +```{r eval = FALSE, include = FALSE} |
| 334 | +# useful during development, e.g. for capturing screenshots |
| 335 | +gs4_browse(ss_xx) |
| 336 | +gs4_browse(ss_ny) |
| 337 | +gs4_browse(ss_utc) |
| 338 | +``` |
| 339 | + |
| 340 | +Capture the current `moment` with `Sys.time()`, construct the data frame described above, and write it into each of the prepared Google Sheets. |
| 341 | + |
| 342 | +```{r} |
| 343 | +dat <- populate_sheets(Sys.time(), c(ss_xx, ss_ny, ss_utc)) |
| 344 | +``` |
| 345 | + |
| 346 | +First, let's look at `dat`, the data frame we sent. |
| 347 | + |
| 348 | +```{r} |
| 349 | +dat |
| 350 | +``` |
| 351 | + |
| 352 | +That's hard to parse since the `datetime` column is a list-column. Here's a different look, with the most natural character representation of that column. |
| 353 | + |
| 354 | +```{r echo = FALSE} |
| 355 | +dat2 <- dat |
| 356 | +dat2$datetime <- vapply( |
| 357 | + dat$datetime, function(x) format(x, usetz = TRUE), character(1) |
| 358 | +) |
| 359 | +dat2 |
| 360 | +``` |
| 361 | + |
| 362 | +Read the Sheets back into R, the Sheet with no explicit time zone set. |
| 363 | + |
| 364 | +```{r} |
| 365 | +read_sheet(ss_xx) %>% as.data.frame() |
| 366 | +read_sheet(ss_ny) %>% as.data.frame() |
| 367 | +read_sheet(ss_utc) %>% as.data.frame() |
| 368 | +``` |
| 369 | + |
| 370 | +Main conclusions: |
| 371 | + |
| 372 | +* All 3 versions of `moment` result in the same serial number in all 3 Sheets. Lesson: in R, time zone is merely a matter of display and, in Sheets, there is only UTC. |
| 373 | +* `moment_ny_utc_force` (forcing `moment`s NY clock time into UTC) results in the same serial number in all 3 Sheets. Lesson: If you want to see a specify clock time in the Sheet, force this on the R side, before writing to Sheets. But realize that you have fudged the datetime data in order to get the desired display. |
| 374 | +* `=NOW()` is one of the few things affected by a Sheet's time zone (along with `=TODAY()`. It allows you to force the Sheet's clock time into UTC. |
| 375 | + |
| 376 | +Clean up. |
| 377 | + |
| 378 | +```{r} |
| 379 | +gs4_find("tz-") %>% |
| 380 | + googledrive::drive_trash() |
| 381 | +``` |
0 commit comments