- From: Chris <
>
- To:
- Subject: [chef] Re: Re: Of Ohai plugins and chef server crashes
- Date: Fri, 27 Apr 2012 09:23:01 -0700
Unfortunately the server was set to :warn. But this is the last thing
in the log prior to the restart:
https://gist.github.com/2510531
Looks like it can't connect to Solr? Solr is remote and set to
solr_url "
http://servername:8983/solr" in server.rb. The /solr was
added recently after a 0.10.4 to 0.10.8 upgrade.
On Fri, Apr 27, 2012 at 8:57 AM, Daniel DeLeo
<
>
wrote:
>
>
>
On Friday, April 27, 2012 at 8:39 AM, Chris wrote:
>
>
> Hi Chefs,
>
>
>
> I have a bit of a mystery(at least to me) on my hands. One of my
>
> environments has hit the Solr maxFieldLength issue
>
> (http://tickets.opscode.com/browse/CHEF-2346) and following the advise
>
> in the ticket hasn't worked since it just lead to a server crash. So
>
> after ignoring the problem for the last couple months, the pain of not
>
> having half of my production nodes available in search became a
>
> unbearable. The idea this time was to attack the size of the node
>
> object itself. Since this environment was linked to a larger Active
>
> Directory domain the 'etc' hash that Ohai creates was pretty big, so I
>
> decided to remove it by disabling the passwd plugin. To do this I
>
> added a small bit of code to the client.rb.erb template:
>
>
>
> <% if node.attribute?("ohai") &&
>
> node["ohai"].attribute?("disabled_plugins") -%>
>
>
>
> Ohai::Config[:disabled_plugins] = [<%=
>
> node["ohai"]["disabled_plugins"].join(",") %>]
>
> <% end -%>
>
>
>
> and this to the environment:
>
>
>
> "ohai": {
>
> "disabled_plugins": [
>
> "\"passwd\""
>
> ]
>
> },
>
>
>
>
>
> I added the disabled plugin to one of the environments in our lab and
>
> everything went pretty smooth. Saw a slight increase the the server
>
> cpu load but still within tolerance. After letting that burn in for
>
> 24hrs I got the go ahead to apply this to the one production
>
> environment that wasn't getting indexed. My nodes are on the default
>
> 30 minute interval so about an hour after making the change the cpu
>
> usage for the chef-server process went to 100% and it started to
>
> consume all the available memory and eventually stopped responding to
>
> clients. Restarting the process didnt help as it would immediately hit
>
> 100% usage and quickly consume all the memory/swap that was regained.
>
>
>
> I rolled the changes back and spent the next couple hrs babysitting
>
> the server and eventually ended up restarting chef-client on all the
>
> nodes in that environment. My question is, why would disabling an Ohai
>
> plugin do this? Or did it? Since disabling it in 'test' didnt have the
>
> same result. The test chef-server and the production chef-server share
>
> the same couch/rabbit/solr/expander and all the clients use pass
>
> through a proxy which decides which server to send them to, so i'm
>
> pretty certain they're configured the same. The only real difference
>
> is that production has about 60 more clients.
>
>
>
> Any thoughts/suggestions on what this could be? Or more likely, what i
>
> screwed up?
>
>
It's hard to imagine how these could be related. Did you get any more info
>
about what chef server was doing? Was there a stack trace when you killed
>
it? Did the CPU spike happen during a particular request, or during the
>
startup routines?
>
>
>
>
> Thanks
>
>
>
> -- chris
>
> --
>
> Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
>
> permitted by applicable law.
>
>
>
>
--
>
Dan DeLeo
>
>
>
--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Archive powered by MHonArc 2.6.16.