This workflow takes advatage of the future
integration to run your local R-functions within a cluster of GCE machines.
You can do this to throw up expensive computations by spinning up a cluster and tearing it down again once you are done.
In summary, this workflow:
The example below uses a default r-base
template, but you can use the steps above to create a dynamic_template
pulled from the Container Registry if required.
Instead of the more generic gce_vm()
that is used for more interactive use, we create the instances directly using gce_vm_container()
so it doesn’t wait for the job to complete before starting the next (not useful if you have a lot of VMs). You can then use gce_get_zone_op()
to get the job status.
library(future)
library(googleComputeEngineR)
## names for your cluster
vm_names <- c("vm1","vm2","vm3")
## create the cluster using default template for r-base
## creates jobs that are creating VMs in background
jobs <- lapply(vm_names, function(x) {
gce_vm_container(file = get_template_file("r-base"),
predefined_type = "n1-highmem-2",
name = x)
})
jobs
# [[1]]
# ==Operation insert : PENDING
# Started: 2016-11-16 06:52:58
# [[2]]
# ==Operation insert : PENDING
# Started: 2016-11-16 06:53:04
# [[3]]
# ==Operation insert : PENDING
# Started: 2016-11-16 06:53:09
## check status of jobs
lapply(jobs, gce_get_zone_op)
# [[1]]
# ==Operation insert : DONE
# Started: 2016-11-16 06:52:58
# Ended: 2016-11-16 06:53:14
# Operation complete in 16 secs
# [[2]]
# ==Operation insert : DONE
# Started: 2016-11-16 06:53:04
# Ended: 2016-11-16 06:53:20
# Operation complete in 16 secs
# [[3]]
# ==Operation insert : DONE
# Started: 2016-11-16 06:53:09
# Ended: 2016-11-16 06:53:30
# Operation complete in 21 secs
## get the VM objects
vms <- lapply(vm_names, gce_vm)
It is safest to setup the SSH keys seperately for multiple instances, using gce_ssh_setup()
- this is normally called for you when you first connect to a VM.
## set up SSH for the VMs
vms <- lapply(vms, gce_ssh_setup)
We now make the VM cluster as per details given in the future README
## make a future cluster
plan(cluster, workers = vms)
The cluster is now ready to recieve jobs. You can send them by simply using %<-%
instead of <-
.
## use %<-% to send functions to work on cluster
## See future README for details: https://github.com/HenrikBengtsson/future
a %<-% Sys.getpid()
## make a big function to run asynchronously
f <- function(my_data, args){
## ....expensive...computations
result
}
## send to cluster
result %<-% f(my_data)
For long running jobs you can use future::resolved
to check on its progress.
## check if resolved
resolved(result)
[1] TRUE