chef - [chef] Re: Re: Re: Re: Re: Re: Re: Chef setup has become unstable

Subscribers: 1946
Owners
Bryan McLellan
Joshua Timberman
Nathen Harvey
Seth Chisamore
Serdar Sutay

Subscribe
Unsubscribe
Info
Archive

Post

RSS
Shared documents

General discussion about Chef

[chef] Re: Re: Re: Re: Re: Re: Re: Chef setup has become unstable

From: Sascha Bates < >
To: AJ Christensen < >
Cc:
Subject: [chef] Re: Re: Re: Re: Re: Re: Re: Chef setup has become unstable
Date: Mon, 18 Jun 2012 17:19:20 -0500

For real? Guys, I find this kind of hilarious. I didn't actually write more than about 3 lines of original code and then rearranged everything. I thought this had to be lamest ever contribution in the history of contributions.

I am so thrilled that this will made a diff for folks.

Sascha

On 6/18/12 4:50 PM, AJ Christensen wrote:

I noticed this thread yesterday, but wanted to reiterate:

Great work Sascha, Many bugs have been caused/exacerbated by the
exceedingly large amount of data some of the win32 plugins for ohai
generate!

Thank you!

--AJ

On 19 June 2012 05:10, Jeremiah
Snapp< >
  wrote:

Great work Sascha!

Jeremiah

On Sun, Jun 17, 2012 at 11:11 AM, Sascha
Bates< >
wrote:

In regards to that thread, I (woohoo!) finally got around to submitting a
pull request that splits the Windows ohai kernel plugin into separate
plugins so it will be easier to cut down on unneeded ohai data for Windows.

On 6/17/12 3:56 AM, Jeremiah Snapp wrote:

I'm just adding my two cents to the great suggestions from MC and Sascha.

As KC suggested you want to consider preventing your nodes from converging
at the same time to reduce the amount of concurrent requests to the server.

When considering the large amount of windows ohai data you may want to
look at a chef thread from may 29 with the subject "Knife search note
returning a node".  It mentions disabling a Windows ohai plugin to reduce
the amount of content.

Refer to http://wiki.opscode.com/display/chef/Disabling+Ohai+Plugins

On Jun 17, 2012 4:31 AM, "Madhurranjan
Mohaan"< >
wrote:

Thanks yet again for the response!

@KC- Yes, we're running just one thread and the nodes are converging
every hour. I'll spike out the unicorn + nginx setup on a new box with
Centos 6.2 and get see how that behaves and then probably move these out to
that setup. Thanks for the tip!

@Sascha  - Its a mix of Windows 2003 32 bit server and WIn 2003 64 bit
mostly. I ain't sure if the sheer amount of ohai data is causing this. Any
other parameters I should consider?

Ranjan

On Sun, Jun 17, 2012 at 3:26 AM, Sascha
Bates< >
wrote:

Could it be the number of Windows servers and the astonishing amount of
ohai data collected for Windows?  My understanding is that Windows ohai has
an awful lot of data. I haven't worked with it in a few months so my memory
is fading a bit and I was chef-solo anyway. 120 Windows nodes might produce
a lot of data.

On 6/16/12 3:47 PM, KC Braunschweig wrote:

On Sat, Jun 16, 2012 at 12:41 PM, Madhurranjan Mohaan
< >
    wrote:

Do you think we should scale out ? If yes, what services do you think
we
should run on different servers? Also, on my end, I am trying to see
if all

Regarding the instability, I can tell you I had issues on RHEL 5.7
because the versions of couchdb and erlang were old. Newer packages
probably would have fixed it, but I upgraded to RHEL 6.1 which also
had newer versions and things were happier. Doesn't sound exactly like
your instability, but worth considering.

Regarding the performance issues, I hope that Josh was joking. 160
nodes is nothing. Are they converging every 30 minutes? Do you have a
reasonable splay? Are your recipes very search heavy? It could be a
lot of things, but I'd start with considering the concurrency on the
server API. Are you running a single Thin process for the API server?
If so, consider running multiple processes with proxy balancer or some
such in front of them. Alternatively switch the server to run in
unicorn with nginx in front of it. I've been happy with unicorn so
far.

I don't think you should be there yet, but 4gb is probably not gonna
be enough forever. Eventually solr will want more heap and you'll need
memory as you add api server workers and couch will take whatever's
left. Which leads back to either adding memory or Josh's point of
splitting components on different servers. That's eventually though,
I'd hope you could get at least a couple hundred nodes with your
current VM and 1000+ with 8gb without too much trouble.

To give you an example, I have a preprod server with about 1000 nodes:
RHEL 6.1 VM
8gb
4 virtual cores
unicorn - 8 api workers, 2 webui workers
solr - 2gb heap
chef 0.10.4

KC

On Sat, Jun 16, 2012 at 7:25 PM, Joshua
Timberman< >
wrote:

Are you running all the chef server services on one machine? What is
the
hardware spec of it? 160 nodes is quite a few. Sounds like you may
need to
start scaling out the server and run services on separate systems.

[chef] Re: Chef setup has become unstable, (continued)
- [chef] Re: Chef setup has become unstable, Joshua Timberman, 06/16/2012
  - [chef] Re: Re: Chef setup has become unstable, Madhurranjan Mohaan, 06/16/2012
    - [chef] Re: Re: Re: Chef setup has become unstable, KC Braunschweig, 06/16/2012
      - [chef] Re: Re: Re: Re: Chef setup has become unstable, Sascha Bates, 06/16/2012
        
        [chef] Re: Re: Re: Re: Re: Chef setup has become unstable, Madhurranjan Mohaan, 06/17/2012
        
        [chef] Re: Re: Re: Re: Re: Re: Chef setup has become unstable, Jeremiah Snapp, 06/17/2012
        
        [chef] Re: Re: Re: Re: Re: Re: Chef setup has become unstable, Sascha Bates, 06/17/2012
        [chef] Re: Re: Re: Re: Re: Re: Re: Chef setup has become unstable, Tim Smith, 06/18/2012
        
        [chef] Re: Re: Re: Re: Re: Re: Chef setup has become unstable, Jeremiah Snapp, 06/18/2012
        [chef] Re: Re: Re: Re: Re: Re: Re: Chef setup has become unstable, AJ Christensen, 06/18/2012
        [chef] Re: Re: Re: Re: Re: Re: Re: Chef setup has become unstable, Sascha Bates, 06/18/2012
        [chef] Re: Re: Re: Re: Re: Re: Re: Re: Chef setup has become unstable, Wes Morgan, 06/18/2012