[chef] Re: RE: Re: Re: chef client locked


Chronological Thread 
  • From: Sascha Bates < >
  • To:
  • Subject: [chef] Re: RE: Re: Re: chef client locked
  • Date: Mon, 18 Mar 2013 09:32:49 -0700

I was looking at this but I found that there was already a discussion and a fix submitted: http://tickets.opscode.com/browse/CHEF-3367

What I'm really curious about is why we have two different methods of forking the process: daemon and Chef::Config[:client_fork] true/false.  If client_fork is set to false, which it is by default, the daemon takes care of forking and that's when we lose the pid and the client hangs.

I'm planning to push out client_fork true to all my clients this morning to take care of the problem.

Sascha


On Mon, Mar 18, 2013 at 8:21 AM, Grégoire Seux < " target="_blank"> > wrote:

Hello again,

 

for the record, I have created a ticket and offered a fix (http://tickets.opscode.com/browse/CHEF-4010)

 

--

Grégoire

 

From: Grégoire Seux
Sent: jeudi 14 mars 2013 09:29
To: " target="_blank">
Subject: RE: [chef] Re: Re: chef client locked

 

Thanks for both reply.

Indeed I have reproduced this only in the case where chef server is not accessible.

It seems to happen quite often, but I don’t know if it is due to high latency between nodes and server (~250 ms), over saturated connection or chef server 11.

I’ll wait for the fix then.

 

--

Grégoire

 

This should be the result of loading the node from the server somehow failing.  I believe Sascha is working on a proper fix, but in the mean time this shouldn't happen if you have a connection to the server.

-- 

Paul Mooring

Systems Engineer and Customer Advocate

 

 

 

I can confirm this. I was debugging it earlier this week and have been looking for the time to write the code to submit a pull request instead of just submitting a bug report :/

 

On Wed, Mar 13, 2013 at 5:27 AM, Grégoire Seux < " target="_blank"> > wrote:

Hello,

using chef 11 (11.4.0) I have noticed a strange behavior when a run fails: the next run won't start because of the locking introduced by http://tickets.opscode.com/browse/CHEF-867.

Log for the client is :

...
ERROR: Errno::ETIMEDOUT: Error connecting to https://chef03-am5 /nodes/mem02-ty5 - Connection timed out - connect(2)
[2013-03-13T11:40:03+01:00] FATAL: Stacktrace dumped to /var/cache/chef/chef-stacktrace.out
[2013-03-13T11:40:03+01:00] ERROR: Sleeping for 1800 seconds before trying again
[2013-03-13T12:10:04+01:00] INFO: Chef client  is running, will wait for it to finish and then run.

I guess this is not the expected impact of the lock, is this a bug ?

Cheers,

--
Grégoire

 





Archive powered by MHonArc 2.6.16.

§