[chef] Re: Re: Re: Fwd: Resource ordering in chef v. puppet


Chronological Thread 
  • From: Arjuna Christensen <aj@opscode.com>
  • To: chef@lists.opscode.com
  • Subject: [chef] Re: Re: Re: Fwd: Resource ordering in chef v. puppet
  • Date: Thu, 23 Jul 2009 14:44:38 +1200

On 23/07/2009, at 2:25 PM, Andrew Willis wrote:

In addition to the complications with accurately modeling dependencies
(which I found added notable development time for my configurations),
I like how the machines themselves are not part of the "code" in chef.
In puppet, the nodes are in the configuration files, so adding a new
node means updating a configuration file. This by itself isn't a big
deal, but when you then check this into your SCM, and then check it
out on the puppetmaster server, it seems like a lot of overhead just
to add a single machine. I wouldn't be surprised if I was doing it
wrong, but I found it created extra steps for little value. It seems
like there are some straight forward ways around this (like including
node information outside of source control), but I didn't want to have
to work around a problem that I didn't want to have in the first
place. With chef, this isn't a big deal. I just validate the node in
the chef server and assign it a role. In practice, this is proving to
be much lighter weight.

Absolutely - this is the model that Puppet + iClassify worked on aswell (A "puppet external node terminus"), which is another HJK Solutions/Opscode product, specifically designed to make this problem less of a problem and ease the interactions between external services and Puppet. Systems were initially 'untagged', and you could untag them and tag them with arbitrary classes (a-lá the Node and Registration views in the chef-server)

We've learnt a lot from a very long usage of Puppet, and I for one am glad that we have learnt from it.

Chef advocates also note that it uses Ruby, not a custom configuration
language. I didn't think this was a huge deal at first, but it's great
not have to learn a syntax that's only used in one place. I'm writing
cookbooks much faster than puppet class files. Most of the savings are
from not having to track down missed dependencies, but using Ruby is
comfortable for me and allows me to refer to the documentation less
frequently.

Using Ruby everywhere all the time also allows you to perform in line ruby 'control' checks, right on the your systems (not your server) as everything required is evaluated on the edge. It also gives us the benefit of being able to TDD or BDD cookbooks and also verify the syntax of both recipes and templates easily (as the chef-repo does).

One of our users wrote up a page on our wiki, oddly enough, entitled 'Just enough Ruby for Chef', which touches on some of the abilities using Ruby as a 3GL DSL gives us: http://wiki.opscode.com/display/chef/Just+Enough+Ruby+for+Chef

Getting the hang of looping will help you immensely when walking through the 'node' data structure, especially for values populated by Ohai, also when iterating upon Search Results from the Chef Server.

%w{some array of new packages to be installed}.each { |p| package(p) }

I'm in the process of migrating my puppet setup to chef, and I'm
really enjoying chef. Getting complex configurations right is straight
forward, primarily because things in chef execute in the order you
expect them to.

I have always personally referred to this as the list of tasks you (as an imaginary Senior or higher-qualified Systems Architect/Engineer/Administrator) passes on to a Junior.

It's straight forward (downward), with little to no magic.

I'm glad that you're enjoying the migration process from Puppet to Chef, if there is anything else we can assist you with feel free to drop mailing list posts or visit our IRC channel (http://wiki.opscode.com/display/chef/IRC)

Regards!

On Tue, Jul 21, 2009 at 5:05 PM, Adam Jacob<adam@opscode.com> wrote:
On Tue, Jul 21, 2009 at 12:49 PM, Peter Burkholder<pburkholder@gmail.com> wrote:
Anyhow, the gist is that Puppet gets it right if one has been thorough
in modelling the dependencies.

That is correct.  The sneaky bit here is that you may not realize that
you got it wrong until it fails unpredictably.  So where is the bug -
in your manifest, or in Puppet?  Clearly it's in your manifest - but
that doesn't make you feel any better when you don't have the system
you expect working the way you expected it (in its entirety - half a
webserver is a broken webserver, not half-correct.)

Would anyone on this list care to offer an alternative view on
resource ordering vs. recipe ordering (which is my understanding of
how Chef handles this)?

Chef handles ordering inside a recipe by applying the resources you
specify in the order you specify them.  So:

package "apache2"

remote_file "/etc/apache2/httpd.conf" do
 source "httpd.conf.erb"
 mode "0644"
 owner "root"
 group "root"
end

Will install Apache first, then render the config file.  In puppet:

package { "apache2":
 ensure => "installed"
}

remotefile { "/etc/apache2/httpd.conf":
 path => "/etc/apache2/httpd.conf",
 source => "httpd.conf",
 module => apache2,
 mode => 0644,
 owner => root,
 group => root,
}

That will fail 50%ish of the time.  You would have a two element
directed graph, where each node is not attached to any other.  So when
you topologically sort it (to get the Array you need for runtime),
you'll wind up with the config file being applied first some of the
time, and the package being installed the rest.  You fix this with:

remotefile { "/etc/apache2/httpd.conf":
 path => "/etc/apache2/httpd.conf",
 source => "httpd.conf",
 module => apache2,
 mode => 0644,
 owner => root,
 group => root,
 require => Package["apache2"]
}

Which now gives you a two-element directed graph, where the elements
are connected to each other, and the package always comes up first.
This straw-man example is the simplest case - now imagine you have
somewhere on the order of a thousand resources under management on a
given system... and if you miss any of those require calls, you will
have bugs lurking in your configuration somewhere that won't be
surfaced until that dice roll comes up.  That was why I say I hate
that feature - I know it's my fault, and I would love to fix it, but I
can only see it when it happens.. and when it happens is transient,
and it's irritation factor increases the more I relied on the tool.

The reason for that feature in puppet is that it can look at the tree
build from the sorted graph and cut branches off when a resource fails
- it won't even attempt to apply the dependent ones.  That means you
can have a system that gets 3/4 configured - perhaps the
authentication portion works, but the application portion does not.
Often the response to this sort of failure is simply to run puppet
again - to get the system to converge.  In my straw-man above, the
system will always work if you run puppet twice.

Chef assumes that any time it cannot run the entire set of
recipes/roles, it has had a critical, human surface-able failure.  You
wanted me to be a webserver, and I did not complete that task.  Chef
is idempotent at the resource level, which means that if you run it
again, we're smart enough to to re-do any work that does not need to
be done.. which means that some failure scenarios can also be solved
by convergence (in particular, transient failures.)  But in the
situation where you reverse the two resources:

remote_file "/etc/apache2/httpd.conf" do
 source "httpd.conf.erb"
 mode "0644"
 owner "root"
 group "root"
end

package "apache2"

You will always fail if the apache2 package creates the /etc/apache2
directory.  That means you can always debug - did the recipe work or
not?  And you can rely on it - if it works once, it'll work again, as
long as no transient changes creep into the system (entropy is a
bitch!).  You also apply the logic you are trained to use already - we
all know how to debug a program by looking at the order you wrote it
in.

Chef does allow you one recipe to include another, and we only apply
them once.  But even in this case, we rely on the order we process
things to determine what order you wanted it done in - if you stick an
include_recipe statement in the middle of a recipe, we will go and run
that recipe and return.  This is the prime mechanism of real
'ordering' in Chef - make sure Apache2 is ready before you install
Passenger, or that Postfix is ready before you start your e-mail
application.

With both Puppet and Chef, the end result is an Array of resources
that have actions they are going to perform on the system (should they
need to be taken.)  The difference between Puppet and Chef is that
Chef makes the Array explicit - it is put together by you when you
write the recipes and roles for your infrastructure.  Puppet makes it
implicit - it attempts to determine the right order on your behalf,
based on the data you gave it.

I strongly prefer the way Chef does it, because I can hold the idea of
a multi-thousand element array in my head (at the very least, I can
mentally zoom in on a portion of it) - but I can't hold a
multi-thousand element directed graph, and all the permutations of a
valid topological sort in my head.

We've also never had a question from a newcomer to Chef about what
order things are done in, if that's any indication. :)

Adam

--
Opscode, Inc.
Adam Jacob, CTO
T: (206) 508-7449 E: adam@opscode.com


-- 
AJ Christensen, Software Engineer
Opscode, Inc.
E: aj@opscode.com




Archive powered by MHonArc 2.6.16.

§