- From: Daniel DeLeo <
>
- To:
- Subject: [chef] Re: Of Ohai plugins and chef server crashes
- Date: Fri, 27 Apr 2012 08:57:17 -0700
On Friday, April 27, 2012 at 8:39 AM, Chris wrote:
>
Hi Chefs,
>
>
I have a bit of a mystery(at least to me) on my hands. One of my
>
environments has hit the Solr maxFieldLength issue
>
(http://tickets.opscode.com/browse/CHEF-2346) and following the advise
>
in the ticket hasn't worked since it just lead to a server crash. So
>
after ignoring the problem for the last couple months, the pain of not
>
having half of my production nodes available in search became a
>
unbearable. The idea this time was to attack the size of the node
>
object itself. Since this environment was linked to a larger Active
>
Directory domain the 'etc' hash that Ohai creates was pretty big, so I
>
decided to remove it by disabling the passwd plugin. To do this I
>
added a small bit of code to the client.rb.erb template:
>
>
<% if node.attribute?("ohai") &&
>
node["ohai"].attribute?("disabled_plugins") -%>
>
>
Ohai::Config[:disabled_plugins] = [<%=
>
node["ohai"]["disabled_plugins"].join(",") %>]
>
<% end -%>
>
>
and this to the environment:
>
>
"ohai": {
>
"disabled_plugins": [
>
"\"passwd\""
>
]
>
},
>
>
>
I added the disabled plugin to one of the environments in our lab and
>
everything went pretty smooth. Saw a slight increase the the server
>
cpu load but still within tolerance. After letting that burn in for
>
24hrs I got the go ahead to apply this to the one production
>
environment that wasn't getting indexed. My nodes are on the default
>
30 minute interval so about an hour after making the change the cpu
>
usage for the chef-server process went to 100% and it started to
>
consume all the available memory and eventually stopped responding to
>
clients. Restarting the process didnt help as it would immediately hit
>
100% usage and quickly consume all the memory/swap that was regained.
>
>
I rolled the changes back and spent the next couple hrs babysitting
>
the server and eventually ended up restarting chef-client on all the
>
nodes in that environment. My question is, why would disabling an Ohai
>
plugin do this? Or did it? Since disabling it in 'test' didnt have the
>
same result. The test chef-server and the production chef-server share
>
the same couch/rabbit/solr/expander and all the clients use pass
>
through a proxy which decides which server to send them to, so i'm
>
pretty certain they're configured the same. The only real difference
>
is that production has about 60 more clients.
>
>
Any thoughts/suggestions on what this could be? Or more likely, what i
>
screwed up?
It's hard to imagine how these could be related. Did you get any more info
about what chef server was doing? Was there a stack trace when you killed it?
Did the CPU spike happen during a particular request, or during the startup
routines?
>
>
Thanks
>
>
-- chris
>
--
>
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
>
permitted by applicable law.
--
Dan DeLeo
Archive powered by MHonArc 2.6.16.