Proposal for BOSH Errands

Background

We've had a number of requests for being able to run scripts against a deployment. The challenge has been defining these scripts and the execution environment without reducing the predictable, descriptive and deterministic nature of BOSH.

Use cases

Bind / Unbind services when deploying Runtime and Services.
Run automated tests against a recent deployment
Add seed data such as buildpacks to a deployment.
Push applications to a freshly deployed Runtime.
Running backup processes.

Out of scope (for now...)

Triggered execution (ie: Run this errand after every deployment).
Cron support (ie: Run this errand every 10 minutes).
Running multiple instances of a particular errand at the same time. BOSH will error saying that the specified errand is already running.
Running an errand on the same VM as another job.

Proposal

Errands are defined in a release, and are configured in the deploy manifest. They are similar to Jobs, but are neither long running, nor deployed during a deployment.

When an errand is run, BOSH will provision a new VM using the specified stemcell, install that errand and all required packages and templates, and run the given command.

By making use of some of the same constructs as jobs, BOSH can ensure that errands are run in a completely predictable, descriptive, and deterministic environment each and every time.

Defining an Errand

Errands are defined in the release in a similar fashion to jobs. Errands make use of packages, and generate templates to configure those packages. This allows an errand that requires a specific runtime (such as ruby) to include it explicitly.

The only differences are:

Errands are first class objects. They're defined as errands as opposed to jobs.
Errands aren't run via monit, but via the bosh run errand command.
Errands are installed into /var/vcap/errands/foo

Configuring an Errand

Deploy Manifest

---
...
networks:
- name: default
  ...
resource_pools:
- name: default
  network: default
  ...
jobs:
  ...
errands:
- name: smoke_test
  release: release_name
  templates:
  - name: smoke_test
  resource_pool: default
  networks:
  - name: default
    default:
    - dns
    - gateway
  # instances: 1
  # persistent_disk: 20480

Running an Errand

Once an errand is defined and configured, running is as simple as bosh run errand smoke_test. Bosh will then provision a new VM, install the errand on it, and execute the errands/smoke_test/execute script.

Additional command-line arguments given to bosh run errand, will be passed to the remote script as command line arguments.

Open Questions

Will errands be run asynchronously (like tasks)? If so, that opens up a whole can of work around bosh errands, bosh errand 123, etc. If not, then we'll have to ensure we don't have timeout issues.
How will we provide the return code of the script being run as opposed to the return code of the bosh run errand command? If errands are async, then the bosh run errand foo command could return before the script finishes executing. If synchronous, then bosh run errand foo could be killed locally, while the script continues to a successful exit.
How do we deal with script output? Do we need to stream that back to the bosh run errand command?

Looks cool! I had a go at answering the open questions below.

Will errands be run asynchronously (like tasks)? If so, that opens up a whole can of work around bosh errands, bosh errand 123, etc. If not, then we'll have to ensure we don't have timeout issues.

How do we deal with script output? Do we need to stream that back to the bosh run errand command?

How will we provide the return code of the script being run as opposed to the return code of the bosh run errand command? If errands are async, then the bosh run errand foo command could return before the script finishes executing. If synchronous, then bosh run errand foo could be killed locally, while the script continues to a successful exit.

Could errands be a kind of task? They could show up in the bosh tasks list and be attached to with bosh task <id>. The user interface for this could be:

$ bosh run errand smoke_test --detach
The errand `smoke_test' has been started.
You can attach to this errand and follow its output by running `bosh task 1234'.

$ bosh task 1234
... output from smoke_test ...

$ bosh run errand smoke_test --attach
... output from smoke_test ...

Choosing a default of --attach or --detach would be up for discussion. By piggybacking on top of the existing tasks semantics you get most of your open questions answered for free.

After this change BOSH will have "tasks", "errands", and about 5 different kinds of "jobs". All of these have very similar meanings outside of BOSH. Could there be a benefit in unifying or eliminating some of these? The Google paper on Omega shows that they have a similar overloading of "jobs" and "tasks" (but they use them to mean slightly different things than their equivalents in BOSH).

tammersaleh/gist:8546241

Proposal for BOSH Errands

Background

Use cases

Out of scope (for now...)

Proposal

Defining an Errand

Configuring an Errand

Running an Errand

Open Questions

xoebus commented Jan 25, 2014

tammersaleh commented Jan 27, 2014

tammersaleh/gist:8546241

Proposal for BOSH Errands

Background

Use cases

Out of scope (for now...)

Proposal

Defining an Errand

Configuring an Errand

Running an Errand

Open Questions

xoebus commented Jan 25, 2014

tammersaleh commented Jan 27, 2014

Let's keep all comments on the email thread.