[chef] Re: Re: RE: Re: Re: chef client locked


Chronological Thread 
  • From: Daniel DeLeo < >
  • To:
  • Subject: [chef] Re: Re: RE: Re: Re: chef client locked
  • Date: Mon, 18 Mar 2013 10:26:20 -0700


On Monday, March 18, 2013 at 9:32 AM, Sascha Bates wrote:

I was looking at this but I found that there was already a discussion and a fix submitted: http://tickets.opscode.com/browse/CHEF-3367

What I'm really curious about is why we have two different methods of forking the process: daemon and Chef::Config[:client_fork] true/false.  If client_fork is set to false, which it is by default, the daemon takes care of forking and that's when we lose the pid and the client hangs.
"daemon" is for OG Unix process backgrounding: fork, set a new process group, replace stdin/stdout with log file or dev/null, etc.

"client_fork" is where each chef run forks a new process. It doesn't create a new process group, detach from the terminal or anything like that. The point of client_fork is that a "disposable" process is used for each run, so a cookbook cannot pollute state that is persisted between runs. This prevents memory leaks in cookbooks (or Chef itself, or some interaction between the two) from impacting the system since the process dies and returns its memory at the end of the run.

I'll see about getting both of these tickets looked at during today's code review.

-- 
Daniel DeLeo
 

I'm planning to push out client_fork true to all my clients this morning to take care of the problem.

Sascha


On Mon, Mar 18, 2013 at 8:21 AM, Grégoire Seux < " target="_blank"> > wrote:

Hello again,

 

for the record, I have created a ticket and offered a fix (http://tickets.opscode.com/browse/CHEF-4010)

 

--

Grégoire

 

From: Grégoire Seux
Sent: jeudi 14 mars 2013 09:29
To: " target="_blank">
Subject: RE: [chef] Re: Re: chef client locked

 

Thanks for both reply.

Indeed I have reproduced this only in the case where chef server is not accessible.

It seems to happen quite often, but I don’t know if it is due to high latency between nodes and server (~250 ms), over saturated connection or chef server 11.

I’ll wait for the fix then.

 

--

Grégoire

 

This should be the result of loading the node from the server somehow failing.  I believe Sascha is working on a proper fix, but in the mean time this shouldn't happen if you have a connection to the server.

-- 

Paul Mooring

Systems Engineer and Customer Advocate

 

 

 

I can confirm this. I was debugging it earlier this week and have been looking for the time to write the code to submit a pull request instead of just submitting a bug report :/

 

On Wed, Mar 13, 2013 at 5:27 AM, Grégoire Seux < " target="_blank"> > wrote:

Hello,

using chef 11 (11.4.0) I have noticed a strange behavior when a run fails: the next run won't start because of the locking introduced by http://tickets.opscode.com/browse/CHEF-867.

Log for the client is :

...
ERROR: Errno::ETIMEDOUT: Error connecting to https://chef03-am5 /nodes/mem02-ty5 - Connection timed out - connect(2)
[2013-03-13T11:40:03+01:00] FATAL: Stacktrace dumped to /var/cache/chef/chef-stacktrace.out
[2013-03-13T11:40:03+01:00] ERROR: Sleeping for 1800 seconds before trying again
[2013-03-13T12:10:04+01:00] INFO: Chef client  is running, will wait for it to finish and then run.

I guess this is not the expected impact of the lock, is this a bug ?

Cheers,

--
Grégoire

 






Archive powered by MHonArc 2.6.16.

§