[chef] Re: RE: Re: Re: Automated check-ins or not...


Chronological Thread 
  • From: Daniel DeLeo < >
  • To:
  • Subject: [chef] Re: RE: Re: Re: Automated check-ins or not...
  • Date: Mon, 13 Jan 2014 13:49:21 -0800


On Monday, January 13, 2014 at 1:16 PM, Phillip Roberts wrote:

The problem isn’t my coworker, the problem is a lack of understanding the tool.

 

Chef is my baby, and I am perfectly fine with automated check-in’s, however, just like any business, there are politics at play. There are fears due to a lack of understanding as well.

 

I am purposely asking for others use cases because I am interested in them to help me form my arguments as to why chef nodes should be checking in (running chef-client) automatically.

 

I am not asking for anyone to tell me whether we should be using chef, or how we should be using chef, I am interested in how it is being used in other environments. I have seen plenty of other environments where I have implemented chef, however, in all cases, I have implemented chef and the policies that surround chef. In all cases, this question has never come up, or this argument.

 

I appreciate the responses thus far.

 

Thanks,

 

Phillip Roberts | Sr. Linux Systems Administrator

There’s a joke that goes around twitter every so often that goes like: “to err is human, to propagate your error to 1000 machines automatically is devops.” I think this joke actually does a good job at getting to the heart of your coworkers’ concerns: what prevents a potentially destructive mistake from getting applied to your whole infrastructure?

As you’ve implied one option is to have chef-client be run manually on each machine to apply updates as desired. The pros of this approach:

* you can use why-run mode to get some indication of what’s going to change when you run chef-client for real
* Workflow is very simple, you don’t need to invest in a lot of testing or extra infrastructure, just upload cookbooks to the server and run them
* If you’re using chef for app deployments, you don’t need to have any additional logic or tooling for orchestration, just run chef on the boxes in the right order.

The downsides:

* You have to manually check whether chef has run recently on all your machines. If you miss one, you could be missing an important security/bug/performance patch. This can get you into problems such as missing a patch on the passive node in a failover pair. When the cluster fails over, the service doesn’t work correctly on the now-active node. I’m sure you can imagine plenty of similar cases.
* Related to the above, you can get a different delta from starting state to desired state than you expected if a machine is a few cookbook iterations behind. This can cause chef-client to fail or to apply a change incorrectly based on the assumptions in your cookbook code.
* Your team has to be fairly disciplined about communicating when changes are made to the chef-server. Say Alice uploads her change, runs it on a trial node and it works correctly. Now she starts a parallel SSH session to run chef-client on the remaining nodes. In the meantime, Bob uploads a change to some “base” cookbook and it’s incompatible with Alice’s change. Alice’s chef-client runs fail or cause an outage on those systems.
* Humans spend a lot of time running chef-client.

My view is that running chef-client in some periodic fashion is a good forcing function that will require you to implement good workflow practices, whether that be cookbook testing with automated uploads to the chef-server from Ci, partitioning your infrastructure so that you have sub-clusters running the “future” cookbook version before the majority of similar machines, testing cookbooks locally in vagrant/test kitchen/whatever, etc. If you have this stuff in place, running chef-client manually (or via orchestration tool) vs. on interval won’t make a big difference.


-- 
Daniel DeLeo




Archive powered by MHonArc 2.6.16.

§