[chef] Re: clients timing out... where to start?


Chronological Thread 
  • From: Jesse Campbell < >
  • To: chef < >
  • Subject: [chef] Re: clients timing out... where to start?
  • Date: Sat, 19 Jan 2013 07:48:19 -0500

okay... well... i found one thing that might be contributing.
every node in the environment was downloading a 1.6 meg data bag item on every chef run


On Sat, Jan 19, 2013 at 7:21 AM, Jesse Campbell < " target="_blank"> > wrote:
I have about 400 clients connecting in what should be a staggered pattern (splay is set to 10 minutes), but every night at least half of them are getting errors like this:

chef-client[20246]: [2013-01-19T07:36:46+00:00] 1: *** Chef 10.16.2 ***
chef-client[20246]: [2013-01-19T07:41:47+00:00] 3: Timeout connecting to chef-app01.ops.atl.setg:4000 for /nodes/nagios.ops, retry 1/5
chef-client[20246]: [2013-01-19T07:46:52+00:00] 3: Timeout connecting to chef-app01.ops.atl.setg:4000 for /nodes/nagios.ops, retry 2/5
chef-client[20246]: [2013-01-19T07:48:44+00:00] 4: Stacktrace dumped to /var/cache/chef/chef-stacktrace.out
chef-client[20246]: [2013-01-19T07:48:44+00:00] 4: Errno::ECONNRESET: Connection reset by peer
chef-client[6790]: [2013-01-19T07:51:46+00:00] 1: *** Chef 10.16.2 ***
chef-client[6790]: [2013-01-19T07:53:34+00:00] 4: Stacktrace dumped to /var/cache/chef/chef-stacktrace.out
chef-client[6790]: [2013-01-19T07:53:34+00:00] 4: Errno::ECONNRESET: Connection reset by peer

I'm not sure what I should be looking at here to diagnose the issue... are there caps on what the merb/ruby api server can handle? Do I need to boost ram or processor? (currently 8 gigs dual core xeon)
Maybe cluster the chef-server api? Maybe drop in the chef 11 erubis server?

thanks in advance!
-jesse




Archive powered by MHonArc 2.6.16.

§