[chef] Re: Changes to run_list (sometimes) won't stick!

Chronological Thread 
  • From: Paul Paradise < >
  • To:
  • Subject: [chef] Re: Changes to run_list (sometimes) won't stick!
  • Date: Wed, 3 Nov 2010 22:14:35 -0700
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=GCwBnw8rVHTwLjyvfhdpN4Nssru+q9d0QZbfgOVNy8xXwcUpMyj/xVGJJR5YK2Wa/H W2KgpUw0BfhM9Igmj+ZbgWUrzr58VxsnwhCPc4pOXTBjQxxfA28oFWhbMEfCLPApskDP 1+yqDvV5dlTMO3jxmThvCVLPRj5lKM+5H9Jw8=

I'm the original reporter (though not reflected in JIRA; Nuo filed it on my behalf) and yeah - it's a pretty ugly bug. I've had similar troubles with data bags and roles edited using knife - I'll often edit a databag just to view it in vim (I know, bad habit) and then forget about it, make a change in another terminal to the same databag, and then eventually quit the first, overwriting my changes. I believe the client needs to have some form of locking or revision control to really make this problem go away - optimistic locking wouldn't be too difficult to retrofit, but something with vector clocks and an archived history would allow for more intelligent merges of conflicting data than "last one wins."

I've been working around it for now by extending the client run interval (reducing the likelihood of a race condition) or just eliminating the chef-client's daemon process entirely. Certain critical nodes I only run chef in an attended fashion - thereby ensuring that I get a consistent state. Unfortunately, not the greatest workaround. :-(


On Wed, Nov 3, 2010 at 7:43 PM, Mike Williams < "> > wrote:
I'm being bitten by a race-condition when altering the run-list for a node (using either the Web UI, or "knife").  Sometimes, the changes just don't stick!

In particular, they don't stick if I make the change while "chef-client" is running on the node.  The problem appears to be that chef-client saves the Node object at the end of it's run, reverting the run-list to it's previous state.

Someone logged a bug about this a couple of weeks ago:


but the problem appears to have existed for a while.  I'm a little surprised that it hasn't been reported as an issue before now!   I guess most people aren't in the habit of messing with run-lists that much ... at this stage, we're doing do mainly in automated tests for our provisioning scripts.

It seems weird to me that chef-client attempts to update the node's own "configuration" data, rather than just it's "status".  Should there be a better separation between the two in Chef::Node?

Can anyone suggest how I might (a) work around the problem, or (b) fix it, in Chef::Node or chef-client?

Is it likely that changes to other Node attributes would be similarly affected, i.e. reverted at the end of each chef-client run?

Mike Williams

Archive powered by MHonArc 2.6.16.