[chef-dev] Re: [chef] Re: Re: Re: Mixlib::Versioning 1.0.0 released

Chronological Thread 
  • From: Jay Feldblum < >
  • To: Warren Bain < >
  • Cc: " " < >, Mike < >, Seth Chisamore < >, " " < >
  • Subject: [chef-dev] Re: [chef] Re: Re: Re: Mixlib::Versioning 1.0.0 released
  • Date: Sat, 30 Mar 2013 20:52:35 -0400


Currently, that would require re-resolving every node's run-list on every cookbook or role push. There are ways to optimize that process for the common case, but that would be the requirement. It would also require re-resolving the node's run-list on every node push (e.g. changing the normal attributes, tags, environment, run-list).

If environments were isolated tenants, cookbooks were per-environment, and chef-server permitted only one version of a cookbook at once per environment, then neither the chef-server nor the chef-client would need to do any dependency-resolutions at all that takes exponential worst-case time. Dependency-resolution that doesn't consider versions takes linear worst-case time because it's a graph walk, not a constraint problem. Dependencies with their versions would be resolved on the workstation and the resolved set could be synced to the server atomically and transactionally.


On Sat, Mar 30, 2013 at 8:30 PM, Warren Bain < " target="_blank"> > wrote:

I'm a bit new to this but surely the dependency possibilities for the cookbooks actually stored on the server are fixed and can be computed in advance such that any node runlist can satisfied very quickly.

Bit then perhaps I totally misunderstand the order of magnitude of the problem.

In any case, I don't see how offloading that problem to every node on every run makes sense.



Warren Bain
Australia's cloud
direct: +61 2 8221 7729
mobile: +61 414 867 559
follow: http://twitter.com/thoughtcroft

Daniel DeLeo < "> > wrote:

On Saturday, March 30, 2013 at 1:55 PM, Jay Feldblum wrote:


The chef-client could also do the dependency resolution itself rather than asking the chef-server to do it. The only new API the chef-server would need to provide is a batch API to fetch the full metadatas of all versions of all cookbooks uploaded to the chef-server, or at least as much of the metadatas as is necessary for the chef-client perform the dependency-resolution. The chef-client can then perform its own dependency-resolution on that data and the chef-server wouldn't need to be involved.
I dislike this approach because it requires an ever-growing amount of data to be shipped to the client on every run, while not solving the problem of version clobbering. With hosted chef, we see that the cookbook version API call is slower than all the others by quite a wide margin, but with the gecode-based solver (I have less personal experience with the pure-erlang replacement) the constraint solution usually only takes a few milliseconds--the call time of the request is dominated by the large amount of disk and network IO required to get the necessary information into the constraint solver. If you were to automate patch-version bumps to avoid clobbering, then you will automatically exacerbate this issue.

Adding another (potentially much slower) network link in this chain feels like a move in the wrong direction to me.

In fact, perhaps this should be done anyway. Dependency-resolution can take exponential time. Nothing on the chef-server should ever be permitted to take exponential time. While it is a problem for a given chef-client if the dependency-resolution takes too long on the chef-client, it's a problem for all clients in an infrastructure if that knocks out the whole chef-server rather than just the one chef-client.
Part of the point of my proposal is that dependency resolution is moved to the workstation: compile a list of compatible cookbooks by hand or automatically, then upload (if necessary) and use environments to lock some set of nodes to the pre-computed solution. This feels more elegant because it requires the least computation overall and (more importantly) moves it off of production systems--no worries about all your hosts suddenly chewing up CPU due to a gnarly dependency graph.

Regarding environments, I've added my thoughts here:
If roles become environment-version-able then the only thing that's left are data bags (and clients, but in practice these are tied pretty closely to nodes, so I'm not clear about the use-case). I've heard many people say that they'd like data bags to be environment-version-able as well; I've always used environment as the id for data bag items where the contents differ per-environment, so the need for this is a bit foreign to me (not to say it's invalid, I just don't understand the use case).


Regarding Mixlib::Versioning - cool! I've added my thoughts here:


Daniel DeLeo

  • [chef-dev] Re: [chef] Re: Re: Re: Mixlib::Versioning 1.0.0 released, Jay Feldblum, 03/30/2013

Archive powered by MHonArc 2.6.16.