chef - [chef] dry-run / no-op mode

Subscribers: 1946
Owners
Bryan McLellan
Joshua Timberman
Nathen Harvey
Seth Chisamore
Serdar Sutay

Subscribe
Unsubscribe
Info
Archive

Post

RSS
Shared documents

General discussion about Chef

[chef] dry-run / no-op mode

From: Adam Jacob < >
To:
Subject: [chef] dry-run / no-op mode
Date: Mon, 30 Nov 2009 16:52:30 -0800

On Mon, Nov 30, 2009 at 2:28 AM, jon stuart
< >
wrote:
> I had a great weekend getting Chef up and running on some lab kit,
> looking at it as an alternative to homegrown scripts and Puppet. It's an
> impressive system so firstly thanks for writing and sharing it :-)
>
> I was wondering about the no-op stuff hinted at in CHEF-13. For me the
> ability to eyeball proposed changes on (a sample of) nodes is pretty
> important, both for avoiding silly mistakes whilst learning the ropes
> and as a checkpoint before rolling out scarily large changes.
>
> If Chef has this ability I can't find it, and if it doesn't then I'm
> wondering if the problem is something a keen Rubyist could work on. Only
> 48 hours I'm admittedly naive about Chef's internals, it might be a hard
> one that needs serious attention from core developers rather than me
> blundering in!
>
> Sorry if this is a FAQ or similar, I couldn't find much mention of it
> other than the ticket.

As Bryan pointed out, Chef doesn't have a --noop mode right now.

I want to take a minute to talk about how this might work, and why
doing this with Chef might produce results that are less satisfactory
than they might otherwise be.

First, off Chef values repeat-ability and consistency over resiliency
when applying recipes.  What that means is that we do things in the
order you tell us to, and we do the same thing every time you run
Chef.  This buys us a couple of neat things: the first is that it's
easy to reason about what happens when Chef gets run, and the second
that given the same set of inputs and the same original system state,
Chef will always fail in the same way (assuming it's a bug in you
recipe at fault.)

This decision causes an interesting condition to exist when talking
about things like --noop.  Because Chef is built of idempotent
resources that expect to be run in order, there is no way for us to
tell that a particular resource later in the resource collection would
succeed if a resource that came before it did not succeeded - we have
to assume that everything works.

As an example, lets take a recipe that add's apt.opscode.com to
/etc/apt/sources.list, runs apt-get update, and then installs the
latest version of ohai.

template "/etc/apt/sources.d/opscode" do
  .. some stuff ..
end

execute "apt-get update" do
  action :nothing
  subscribe :run, resources(:template => "/etc/apt/sources.d/opscode")
end

package "ohai" do
  action :upgrade
end

In a normal Chef run, if the template fails, or the apt-get update
fails, the package won't even attempt to be installed.  In a dry-run
world, you would have to assume that every resource would either take
no action, or be successful.  So if the template did not need to be
rendered, we would know that we didn't need to run apt-get update; but
what about the package?  We'll likely fail to find it even available
in the package list, at least on the first pass, causing a failure
that may cascade through the rest of the resource collection.

This problem gets exacerbated when you start thinking about the
dynamism that is present in Chef - you can alter the resource
collection at run time, you can search across the entire
infrastructure, you can query data bags, etc.  Each of these can
potentially alter the resource collection, or alter the way a resource
might be rendered.  Which means that, between the output of your dry
run and the actual run, the actions taken might change.

All of this means that, while a dry-run mode is possible, it is also
likely to tell you lies about what might really happen.

The use-case you specify above is the ability to eyeball proposed
changes on a sample of nodes, and only apply them to the entire world
once you are comfortable with them.  That problem sounds like it could
be solved by our adding Infrastructure support (the ability to have
more than one environment, say dev->test->staging->production) with
the ability to propagate a cookbook version from one environment to
another, along with some great reporting about what each chef run has
done to your system after the fact.  Would that satisfy your use case
for a dry-run mode?

The above is true about most other configuration management systems
dry run modes - Puppet at the very least (although the resource level
dependency tracking gives puppet some interesting options about
chopping off limbs of the tree as failure happens).  Bcfg2 and
Cfengine2 actually have some potential to be valuable here, since they
are basically policy engines that are order-agnostic - Bcfg2 can tell
you that N packages are out of policy, and Y services are out of
policy, etc.

What other use-cases are there here, that aren't under-cut by the very
real potential for lies?  Would it be enough to enable the visibility
into how the system is really behaving that would allow you to gain
the level of trust you need, rather than a full-on dry run mode?

Adam
--
Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E:

[chef] dry-run / no-op mode, Adam Jacob, 11/30/2009