[chef] Re: RE: Re: Re: Re: Re: Re: RE: Re: Re: Automated check-ins or not...


Chronological Thread 
  • From: Lamont Granquist < >
  • To:
  • Subject: [chef] Re: RE: Re: Re: Re: Re: Re: RE: Re: Re: Automated check-ins or not...
  • Date: Tue, 14 Jan 2014 11:13:07 -0800


One tool that you have with chef is --why-run mode.  You can use this to prove that most of the time the chef-client runs are doing nothing.  We've also got a reporting feature for HEC that reports on changed resources.  As long as your resources are properly idempotent then your daemonized chef-client runs should all be NOPs and no change should occur.  If neither of those options work for you, you can always write your own reporting handler and write code that walks the chef resource collection and extracts the information on what resources have changed.  That can go a long way towards addressing FUD about having a process on the box making scary changes.

There's only two ways that change happens outside of reviewed change then.  One way is that someone changes a box manually and then the chef-client resets that change:

http://devopsreactions.tumblr.com/post/73295491766/changing-permissions-that-are-enforced-by-cfengine

The other way is that your promotion to production workflow might be hard to follow and someone could make a mistake there, so you need to focus on getting that correct, and define the technical and business process that results in changes getting into production.

On 1/14/14 7:42 AM, Phillip Roberts wrote:
" type="cite">

I appreciate everyone’s response to this thread. It has been a pretty good discussion and I have gathered some great information from it.

 

You both are correct. I however, was most interested in the broader discussion of how others are handling it in their environment.

 

I wasn’t necessarily looking for “this is how you should do it”, “this is how you should handle your coworker” or anything like that. Just a broader discussion of how each were using it in their environment. It helps me jog some fresh ideas for furthering our implementation here, and helping increase not only my maturity, but my team as a whole.

 

Here, we have a young team, who is in the process of migrating from being “SysOps” or the “Operations Team” to “Development Operations, DevOps”. There is a bit of culture clash, our engineering team has a ton of talent, but it is older talent, with older more proven processes to how they do things, that many of us would consider antiquated. They are very frightened by the idea of continuous integration, maybe even threatened. Our environment is evolving, and when I joined the team 6 months ago, I came with a deep desire to help them go from using an adhoc deployment perl script to using an automated workflow and true Infrastructure as Code. 

 

Doing so means teaching people who have never written ruby code or worked with chef how to do so, also, teaching people who have never been around continuous integration, to understand continuous integration and test driven infrastructure.

 

I get looked at like I have a third eye when I say, write a test before you write any other code.

 

It’s a steep learning curve, I know because I have been through it. I am working to increase the teams knowledge and maturity, but there are going to be bumps along the way.

 

The suggestion of not running chef-client automatically on our nodes actually infuriated me at first. Instead of popping off a half thought out angry email to our CTO, I decided to take some time to think about it and ask for help from you guys in thinking about it. Again I really appreciate everyone’s involvement in this thread.

 

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 | www.MyBuys.com

cid:image001.png@01CDED83.57EED120

 

From: Greg Zapp [ ">mailto: ]
Sent: Tuesday, January 14, 2014 2:20 AM
To: ">
Subject: [chef] Re: Re: Re: Re: Re: RE: Re: Re: Automated check-ins or not...

 

Well, Phillip did said "I am being slightly vague on purpose, because I am looking for full case examples from others using chef and how they are using it." ;)

 

-Greg

 

On Tue, Jan 14, 2014 at 8:01 PM, Lamont Granquist < " target="_blank"> > wrote:


Yeah, but he's talking about a more fundamental problem with his management/co-workers not being okay with the fundamental idea of an automated job running which might change system config.

You're off on a completely different planet where you've accepted the basic premise of "DevOps" (for lack of a better term) and its a question not of "should we do it?" but "how aggressive?" and thats influenced by how well along the road to continuous integration / continuous deployment you are, which would be like trying to explain quantum mechanics to a cave man.



On 1/13/14 5:28 PM, Greg Zapp wrote:

My cookbooks hook into our orchestration server via REST calls to pull down information about which sites should be configured, etc.  During POC build out I had Chef run every minute, but most of my machines are Windows servers and Chef is very CPU hungry there.  We have modified our orchestration server to set the updated time for the "pool" when any resource contained in the "pool" is modified.  I wrapped Chef in a .Net app/service that will first check if the pool has been changed since the last successful Chef run.  This is how we chose to mitigate Chef's CPU hunger and allow for faster converge times.

 

-Greg

 

On Tue, Jan 14, 2014 at 1:56 PM, Lamont Granquist < " target="_blank"> > wrote:


Yeah, places that I've been where managers have been afraid of config management (CFengine at the time) running on a schedule has resulted in an accretion of changes over time, and then once enough changes got queued up that we had to run it on a server and the change window was scheduled and it was approved by our CRB board and appropriate offerings were burned to the gods of ITIL, the changes would often wind up causing outages because so many changes hit the server and it was hard to determine the impact ahead of time.  But the outages were all contained to change windows and were approved, so I guess that makes it okay.

A tactic that I've used in the past has been to run CM only once per day and run it with a 12-hour random splay and time it for 8pm-8am.  Changes can be committed during the business day and they don't immediately take effect, then they can get tested or pushed out manually.  And if anything goes wrong, it'll start hitting servers at 8pm and you have a longer window before it hits your entire infrastructure and more time for you to get monitoring alerts and stop the changes rolling out.  If you just run Chef every 30 minutes with a 5 minute random splay, then its likely that by the time your monitoring alerts you and you start taking action that the change has hit your entire infrastructure.  By only doing the "scheduled" runs once per day you still keep the deltas between runs small, you allow yourself some time to stop your CM tool before it all rolls out, and you also reduce the load on your chef server infrastructure (or on our HEC infrastructure).

The other thing is that if you only run Chef once a week or once a month on-demand, then you're not getting the "self-repairing" and SOX/PCI-DSS "prevent control" features of configuration management.  If you're running it nightly then any junior SA or malicious attacker that logs into the server and manually changes the state of critical files will have those changes immediately rolled back.  That produces prevent controls that auditors really like.  That also trains your junior SAs to not make with the typey-typey on the keyboard and to use the CM program -- otherwise they tend to fall back to old behaviors of making changes on the console and then its not their fault they did that, its going to be Chef's fault that it rolled those changes back when its eventually run and reverts those changes and the service crashes.



On 1/13/14 1:32 PM, David Petzel wrote:

We had quite a few discussions about this as well and at the end of the day we opted for the ability to do both on-demand as well as scheduled. There were concerns that without a scheduled check-in the amount of drift in systems could become large over time on servers that don't routinely get deployments done. With that drift comes a slew of unknown issues. By enforcing a schedule run we could be sure that hand modified configurations didn't stick around very long.

 

We've setup a report to notify us if a node has not checked-in in the last day. This helps us catch cases where the schedule run might be failing and other notification mechanisms might not be catching it (it some nasty compile error super early in the run)

 

From there we extended an existing in house tool that lets anyone with access request a chef run without needing access to the servers.

 

 

 

On Mon, Jan 13, 2014 at 4:16 PM, Phillip Roberts < " target="_blank"> > wrote:

The problem isn’t my coworker, the problem is a lack of understanding the tool.

 

Chef is my baby, and I am perfectly fine with automated check-in’s, however, just like any business, there are politics at play. There are fears due to a lack of understanding as well.

 

I am purposely asking for others use cases because I am interested in them to help me form my arguments as to why chef nodes should be checking in (running chef-client) automatically.

 

I am not asking for anyone to tell me whether we should be using chef, or how we should be using chef, I am interested in how it is being used in other environments. I have seen plenty of other environments where I have implemented chef, however, in all cases, I have implemented chef and the policies that surround chef. In all cases, this question has never come up, or this argument.

 

I appreciate the responses thus far.

 

Thanks,

 

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 | www.MyBuys.com

cid:image001.png@01CDED83.57EED120

 

From: Christopher Armstrong [mailto: " target="_blank"> ]
Sent: Monday, January 13, 2014 4:09 PM
To:
" target="_blank">
Subject: [chef] Re: Re: Automated check-ins or not...

 

Chef as a tool is used for orchestration, converging nodes to a desired state. If your coworker doesn't want nodes checking in automatically, then perhaps Chef isn't the ideal tool for you. What does your use case look like?

 

On Mon, Jan 13, 2014 at 1:05 PM, Ranjib Dey < " target="_blank"> > wrote:

by check in do you mean chef runs or chef registrations. I am aware of 3 different ways 

 

1) on demand: use rundeck, or mco or capistrano like tools to invoke chef run. pros: on demand :-), which helps if you deploy your application via chef. also you can eliminate the need of a validation certificate. cons: requires additional tooling, special security considerations etc. 

 

2) as service : specify a splay time, and use the standard init scripts to run chef client as service. pros:  no additional configuration required, no dependency on any other tools. cons: memory leak, stale processes used to be a pain.

 

3) as a scheduled job : use cron or rufus like system to run chef on periodic interval. pros: simple, less prone to memory leaks., cons: infra has to be designed as evantually consistent, on demand application deployment can not be done., additional considerations needed on deciding cron times on individual servers, else u'll storm the chef server.

 

 

i have used pretty much all three of these. and i think all of them has merits. choose any one depending upon what you do, how you are doing it and how comfortable you are with chef and those tools. most of the issues with running chef as service are now sorted (or workarounds are known).

 

best

ranjib

 

 

On Mon, Jan 13, 2014 at 12:52 PM, Phillip Roberts < " target="_blank"> > wrote:

I am interested in hearing what others are doing in terms of allowing nodes to automatically check in with chef or not. It has recently come up as a concern with a party in our company, he would prefer to not see nodes check in automatically with chef (I currently have a cron job that runs chef-client every X number of minutes).

 

I am just interested in hearing how others manage this, I am not certain that I think that manually running chef-client is a good solution.

 

I am being slightly vague on purpose, because I am looking for full case examples from others using chef and how they are using it.

 

Thanks,

 

Phillip Roberts | Sr. Linux Systems Administrator

San Mateo | Ann Arbor | New York | London

O 734.922.7014 | C 614.423.9871 | www.MyBuys.com

cid:image001.png@01CDED83.57EED120

 

 

 

 

 

 

 

 





Archive powered by MHonArc 2.6.16.

§