[chef] Re: Double converge, blocking on search, event-driven chef-client?


Chronological Thread 
  • From: Peter Burkholder < >
  • To:
  • Subject: [chef] Re: Double converge, blocking on search, event-driven chef-client?
  • Date: Wed, 1 Oct 2014 17:29:12 -0700

Hi,

You asked:
Chef Listers: How have you addressed this situation in your environment?

Chef can't converge a Nagios server often enough to keep up with a dynamic environment. I was in the same boat a couple of years ago.

The big win here is to abandon Nagios for Sensu. I can say more if you're willing to consider it.  http://sensuapp.org/ 

But it was way cool to move monitoring to end of my run so I didn't get alerts on down services as a node was being instantiated for the first time.

Cheers,

Peter

On Wed, Oct 1, 2014 at 12:48 PM, Justin Dossey < " target="_blank"> > wrote:
Hi all,

One thing I see come up regularly when deploying a set of recipes to a network of machines is that search-populated config data isn't necessarily available when it's needed.

Let's use a simple example-- Nagios server and NRPE server.  In our hypothetical example, we want to use nrpe's ability to respond only to a configured list of IP addresses in nrpe.conf.  Based on the general principles behind converged infrastructure, the IP list would be populated based on the results of a search.

With a brand-new network deployment, however, it is likely that nrpe will converge on one node before the nagios server's entire run list converges on another node.  Therefore, the nrpe server recipe will have no results from its search until the nagios server node converges successfully once. On the first converge to take place after the nagios server's node data has been stored in solr, the nrpe server will get data to write to the nrpe configuration file.

This is what I mean by "double converge"-- it takes at least two converges to complete the nrpe server installation and configuration because the necessary data is not available in the first converge.

One way to reduce the number of converges is to poll search until a result comes back.  Something like this in the nrpe-server recipe code:

results = []
do
  results = search(...)
  break unless results.empty?
  sleep(10)
end

would cause the recipe to poll the Chef server every ten seconds until a response came back, or until the client token expired (usually around 15 minutes). 

Such behavior is not ideal (particularly because of the client token expiry issue) but besides supporting a second converge, it's the only way I have seen that will accomplish the desired result.

One slightly wild idea kicking around would be to have some ability for the client to register for an event, and associate a set of resources with the event triggering.  In our hypothetical, that would allow the nrpe server's chef client to converge everything it could (perhaps, a set of other recipes in the run list) and be idle until the nagios server's chef client completes converging its run list.

Chef Listers: How have you addressed this situation in your environment?

--
Justin Dossey
Practice Owner
New Context Services, Inc




Archive powered by MHonArc 2.6.16.

§