Overview

googledrive allows you to interact with files on Google Drive from R.

Installation

Install from CRAN:

install.packages("googledrive")

Usage

Load googledrive

library("googledrive")

Package conventions

  • Most functions begin with the prefix drive_. Auto-completion is your friend.
  • Goal is to allow Drive access that feels similar to Unix file system utilities, e.g., find, ls, mv, cp, mkdir, and rm.
  • The metadata for one or more Drive files is held in a dribble, a “Drive tibble”. This is a data frame with one row per file. A dribble is returned (and accepted) by almost every function in googledrive. It is designed to give people what they want (file name), track what the API wants (file id), and to hold the metadata needed for general file operations.
  • googledrive is “pipe-friendly” and, in fact, re-exports %>%, but does not require its use.

Quick demo

Here’s how to list the first 50 files you see in My Drive. You can expect to be sent to your browser here, to authenticate yourself and authorize the googledrive package to deal on your behalf with Google Drive.

drive_find(n_max = 50)
#> Auto-refreshing stale OAuth token.
#> # A tibble: 50 x 3
#>                       name                                           id
#>  *                   <chr>                                        <chr>
#>  1         chicken-xyz.csv                 0B0Gh-SuuA2nTVUZGclZiSzZ0bkE
#>  2          chicken-rm.txt                 0B0Gh-SuuA2nTT3dBbXd1ZWtvSkE
#>  3             chicken.jpg                 0B0Gh-SuuA2nTbEhtYnIzcFNfX3M
#>  4      README-mirrors.csv 1LJlt-1emr662GV8WdEzddzsfqrt-VgQeZWoBrXh1Dic
#>  5      README-mirrors.csv 1PLXfempSnjpXbKVEXwMG5vBEnd-FwmC26fw5lZ55FXg
#>  6                     def                 0B0Gh-SuuA2nTRG5YWFVGaV8zbU0
#>  7                     abc                 0B0Gh-SuuA2nTT2NqTGdLVWFkcjA
#>  8          folder1-level4                 0B0Gh-SuuA2nTaTR6elE0TjZUUHM
#>  9          folder1-level3                 0B0Gh-SuuA2nTWktWeTB0ajVoQjQ
#> 10 cranberry-TEST-drive-ls 1PM--xCb5axy5Uu9f6fDNjPAN2psRbQ2_UOeU1v2zK0E
#> # ... with 40 more rows, and 1 more variables: drive_resource <list>

You can narrow the query by specifying a pattern you’d like to match names against. Or by specifying a file type: the type argument understands MIME types, file extensions, and a few human-friendly keywords.

drive_find(pattern = "chicken")
drive_find(type = "spreadsheet")     ## Google Sheets!
drive_find(type = "csv")             ## MIME type = "text/csv"
drive_find(type = "application/pdf") ## MIME type = "application/pdf"

Alternatively, you can refine the search using the q query parameter. Accepted search clauses can be found in the Google Drive API documentation. For example, to get all files with 'horsebean' somewhere in their full text (such as files based on the chickwts dataset!), do this:

(files <- drive_find(q = "fullText contains 'horsebean'"))
#> # A tibble: 8 x 3
#>                               name
#> *                            <chr>
#> 1                         chickwts
#> 2 chickwts_gdoc-TEST-drive-publish
#> 3                           foobar
#> 4                           foobar
#> 5      chickwts-TEST-drive-publish
#> 6    chickwts_gdoc-TEST-drive-list
#> 7  chickwts_txt-TEST-drive-publish
#> 8          hadley-googledrive-tour
#> # ... with 2 more variables: id <chr>, drive_resource <list>

You generally want to store the result of a googledrive call, as we do with files above. files is a dribble with info on several files and can be used as the input for downstream calls. It can also be manipulated like a regular data frame at any point.

Identify files

drive_find() searches by file properties, but you can also identify files by name (path, really) or by Drive file id using drive_get().

(x <- drive_get("~/abc/def"))
#> # A tibble: 1 x 4
#>    name       path                           id drive_resource
#>   <chr>      <chr>                        <chr>         <list>
#> 1   def ~/abc/def/ 0B0Gh-SuuA2nTRG5YWFVGaV8zbU0    <list [31]>

as_id() can be used to coerce various inputs into a marked vector of file ids. It works on file ids (for obvious reasons!), various forms of Drive URLs, and dribbles.

## let's retrieve same file by id (also a great way to force-refresh metadata)
x$id
#> [1] "0B0Gh-SuuA2nTRG5YWFVGaV8zbU0"
drive_get(as_id(x$id))
#> # A tibble: 1 x 3
#>    name                           id drive_resource
#> * <chr>                        <chr>         <list>
#> 1   def 0B0Gh-SuuA2nTRG5YWFVGaV8zbU0    <list [31]>
drive_get(as_id(x))
#> # A tibble: 1 x 3
#>    name                           id drive_resource
#> * <chr>                        <chr>         <list>
#> 1   def 0B0Gh-SuuA2nTRG5YWFVGaV8zbU0    <list [31]>

In general, googledrive functions that operate on files allow you to specify the file(s) by name/path, file id, or in a dribble. If it’s ambiguous, use as_id() to flag a character vector as holding Drive file ids as opposed to file paths. This function can also extract file ids from various URLs.

Upload files

We can upload any file type.

(chicken <- drive_upload(
  drive_example("chicken.csv"),
  "README-chicken.csv"
))
#> Local file:
#>   * /Users/jenny/resources/R/library/googledrive/extdata/chicken.csv
#> uploaded into Drive file:
#>   * README-chicken.csv: 0B0Gh-SuuA2nTcDcwYXJXQS1mT00
#> with MIME type:
#>   * text/csv
#> # A tibble: 1 x 3
#>                 name                           id drive_resource
#> *              <chr>                        <chr>         <list>
#> 1 README-chicken.csv 0B0Gh-SuuA2nTcDcwYXJXQS1mT00    <list [37]>

Notice that file was uploaded as text/csv. Since this was a .csv document, and we didn’t specify the type, googledrive guessed the MIME type. We can overrule this by using the type parameter to upload as a Google Spreadsheet. Let’s delete this file first.

drive_rm(chicken)
#> Files deleted:
#>   * README-chicken.csv: 0B0Gh-SuuA2nTcDcwYXJXQS1mT00

## example of using a dribble as input
chicken_sheet <- drive_upload(
  drive_example("chicken.csv"),
  "README-chicken.csv",
  type = "spreadsheet"
)
#> Local file:
#>   * /Users/jenny/resources/R/library/googledrive/extdata/chicken.csv
#> uploaded into Drive file:
#>   * README-chicken.csv: 1ybhWRj2pbPiFZCDG-4vsfxrPLZK6exEOmV5ts7ZC048
#> with MIME type:
#>   * application/vnd.google-apps.spreadsheet

Much better!

Share files

To allow other people access your file, you need to change the sharing permissions. You can check the sharing status by running drive_reveal(..., "permissions"), which adds a logical column shared and parks more detailed metadata in a permissions_resource variable.

chicken_sheet %>% 
  drive_reveal("permissions")
#> # A tibble: 1 x 5
#>                 name shared                                           id
#>                <chr>  <lgl>                                        <chr>
#> 1 README-chicken.csv  FALSE 1ybhWRj2pbPiFZCDG-4vsfxrPLZK6exEOmV5ts7ZC048
#> # ... with 2 more variables: drive_resource <list>,
#> #   permissions_resource <list>

Here’s how to grant anyone with the link permission to be able to view this dataset.

(chicken_sheet <- chicken_sheet %>%
   drive_share(role = "reader", type = "anyone"))
#> Permissions updated
#>   * role = reader
#>   * type = anyone
#> For files:
#>   * README-chicken.csv: 1ybhWRj2pbPiFZCDG-4vsfxrPLZK6exEOmV5ts7ZC048
#> # A tibble: 1 x 5
#>                 name shared                                           id
#>                <chr>  <lgl>                                        <chr>
#> 1 README-chicken.csv   TRUE 1ybhWRj2pbPiFZCDG-4vsfxrPLZK6exEOmV5ts7ZC048
#> # ... with 2 more variables: drive_resource <list>,
#> #   permissions_resource <list>

Publish files

Versions of Google Documents, Sheets, and Presentations can be published online. You can check your publication status by running drive_reveal(..., "published"), which adds a logical column published and parks more detailed metadata in a revision_resource variable.

chicken_sheet %>% 
  drive_reveal("published")
#> # A tibble: 1 x 7
#>                 name published shared
#>                <chr>     <lgl>  <lgl>
#> 1 README-chicken.csv     FALSE   TRUE
#> # ... with 4 more variables: id <chr>, drive_resource <list>,
#> #   permissions_resource <list>, revision_resource <list>

By default, drive_publish() will publish your most recent version.

(chicken_sheet <- drive_publish(chicken_sheet))
#> Files now published:
#>   * README-chicken.csv: 1ybhWRj2pbPiFZCDG-4vsfxrPLZK6exEOmV5ts7ZC048
#> # A tibble: 1 x 7
#>                 name published shared
#>                <chr>     <lgl>  <lgl>
#> 1 README-chicken.csv      TRUE   TRUE
#> # ... with 4 more variables: id <chr>, drive_resource <list>,
#> #   permissions_resource <list>, revision_resource <list>

Download files

Google files

We can download files from Google Drive. Native Google file types (such as Google Documents, Google Sheets, Google Slides, etc.) need to be exported to some conventional file type. There are reasonable defaults or you can specify this explicitly via type or implicitly via the file extension in path. For example, if I would like to download the “538-star-wars-survey” Google Sheet as a .csv I could run the following.

drive_download("538-star-wars-survey", type = "csv")
#> File downloaded:
#>   * 538-star-wars-survey
#> Saved locally as:
#>   * 538-star-wars-survey.csv

Alternatively, I could specify type via the path parameter.

drive_download(
  "538-star-wars-survey",
  path = "538-star-wars-survey.csv",
  overwrite = TRUE
)
#> File downloaded:
#>   * 538-star-wars-survey
#> Saved locally as:
#>   * 538-star-wars-survey.csv

Notice in the example above, I specified overwrite = TRUE, in order to overwrite the local file previously saved.

Finally, you could just allow export to the default type. In the case of Google Sheets, this is an Excel workbook:

drive_download("538-star-wars-survey")
#> File downloaded:
#>   * 538-star-wars-survey
#> Saved locally as:
#>   * 538-star-wars-survey.xlsx
All other files

Downloading files that are not Google type files is even simpler, i.e. it does not require any conversion or type info.

## upload something we can download
text_file <- drive_upload(drive_example("chicken.txt"), name = "text-file.txt")
#> Local file:
#>   * /Users/jenny/resources/R/library/googledrive/extdata/chicken.txt
#> uploaded into Drive file:
#>   * text-file.txt: 0B0Gh-SuuA2nTLXIwSVpNVnY3NlE
#> with MIME type:
#>   * text/plain

## download it and prove we got it
drive_download("text-file.txt")
#> File downloaded:
#>   * text-file.txt
#> Saved locally as:
#>   * text-file.txt
readLines("text-file.txt") %>% head()
#> [1] "A chicken whose name was Chantecler"      
#> [2] "Clucked in iambic pentameter"             
#> [3] "It sat on a shelf, reading Song of Myself"
#> [4] "And laid eggs with a perfect diameter."   
#> [5] ""                                         
#> [6] "—Richard Maxson"

Clean up

drive_rm(chicken_sheet, text_file)
#> Files deleted:
#>   * README-chicken.csv: 1ybhWRj2pbPiFZCDG-4vsfxrPLZK6exEOmV5ts7ZC048
#>   * text-file.txt: 0B0Gh-SuuA2nTLXIwSVpNVnY3NlE