[chef] clients timing out... where to start?


Chronological Thread 
  • From: Jesse Campbell < >
  • To: chef < >
  • Subject: [chef] clients timing out... where to start?
  • Date: Sat, 19 Jan 2013 07:21:13 -0500

I have about 400 clients connecting in what should be a staggered pattern (splay is set to 10 minutes), but every night at least half of them are getting errors like this:

chef-client[20246]: [2013-01-19T07:36:46+00:00] 1: *** Chef 10.16.2 ***
chef-client[20246]: [2013-01-19T07:41:47+00:00] 3: Timeout connecting to chef-app01.ops.atl.setg:4000 for /nodes/nagios.ops, retry 1/5
chef-client[20246]: [2013-01-19T07:46:52+00:00] 3: Timeout connecting to chef-app01.ops.atl.setg:4000 for /nodes/nagios.ops, retry 2/5
chef-client[20246]: [2013-01-19T07:48:44+00:00] 4: Stacktrace dumped to /var/cache/chef/chef-stacktrace.out
chef-client[20246]: [2013-01-19T07:48:44+00:00] 4: Errno::ECONNRESET: Connection reset by peer
chef-client[6790]: [2013-01-19T07:51:46+00:00] 1: *** Chef 10.16.2 ***
chef-client[6790]: [2013-01-19T07:53:34+00:00] 4: Stacktrace dumped to /var/cache/chef/chef-stacktrace.out
chef-client[6790]: [2013-01-19T07:53:34+00:00] 4: Errno::ECONNRESET: Connection reset by peer

I'm not sure what I should be looking at here to diagnose the issue... are there caps on what the merb/ruby api server can handle? Do I need to boost ram or processor? (currently 8 gigs dual core xeon)
Maybe cluster the chef-server api? Maybe drop in the chef 11 erubis server?

thanks in advance!
-jesse



Archive powered by MHonArc 2.6.16.

§