[chef-dev] Concurrency in Chef (was: How do I know if my application has really been "provisioned"? a suggestion)


Chronological Thread 
  • From: Daniel DeLeo < >
  • To: Peter Donald < >
  • Cc:
  • Subject: [chef-dev] Concurrency in Chef (was: How do I know if my application has really been "provisioned"? a suggestion)
  • Date: Mon, 10 Dec 2012 09:16:25 -0800

This thread has drifted pretty far from the original post's topic, so I'm forking it.


On Sunday, December 9, 2012 at 12:59 PM, Peter Donald wrote:

Hi,

It would be nice if chef allowed to to converge recipes/resources in parallel where possible and even the possibility of futures to join against when you wanted to wait between resources however that would significantly increase the complexity of chef.

A few points here:

First of all, we've started talking about solutions without stating the problem. I'll go ahead and guess that the problem being solved is optimizing the convergence time of the initial chef-client run on a bare node.

If that's the case, then the next step is to understand the problem: why do your runs take so long? You could use something like the elapsed time handler[1] to see how long each resource type takes. From there you can dig deeper: what resource are you constrained on? Network IO? Disk IO? CPU? Depending on the answer to that question, there are probably a variety of solutions, such as creating local package mirrors, moving compilation of source code from Chef to Ci (then packaging the result), etc.

Once you have a set of options, you can look at the relative value of each of them. Without data from the above step, I can't list the "pros" for making chef concurrent, but I can list some cons:

* complication of the Chef model: you need a way to specify which parts of your run_list can run concurrently, which other parts they depend on, etc.
* Opportunity for user error: if you miss a dependency in the above, you'll see intermittent errors due to a race condition
* Need to build concurrency primitives and make users use them instead of core ruby classes: unlike languages like Erlang, Haskell, etc., ruby's core data structures are mutable, so wherever users have the opportunity to modify data, concurrency needs to be taken into account. Adding mutexes to an existing program is a game of whack-a-mole, and bugs could remain hidden for a long time, so the better approach would be to rearchitect Chef using some sort of actor library. Then users would need to understand that library to use Chef's code.
* Despite the above, you still have opportunity for concurrency bugs though side effects: does your package manager get a global lock? Do different parts of your chef run implicitly rely on shared system state (files, etc.)?

I think generally if a problem is identified where concurrency is the most effective solution, you're better off solving a small focused issue and keeping the concurrent part of the code as contained as possible. For a off-the-top-of-my-head example, if you need to download a bunch of files and you absolutely can't cache them (or there's no value in adding more caching) then a parallel version of the remote_file resource would be preferable to adding all the stuff necessary to run a handful of remote_file resources in parallel.

 
-- 
Daniel DeLeo

1. https://github.com/jamesc/chef-handler-elapsed-time



Archive powered by MHonArc 2.6.16.

§