[chef] Re: Re: Re: Re: Re: Re: Chef setup has become unstable


Chronological Thread 
  • From: Sascha Bates < >
  • To: Jeremiah Snapp < >
  • Cc: , KC Braunschweig < >
  • Subject: [chef] Re: Re: Re: Re: Re: Re: Chef setup has become unstable
  • Date: Sun, 17 Jun 2012 10:11:33 -0500

In regards to that thread, I (woohoo!) finally got around to submitting a pull request that splits the Windows ohai kernel plugin into separate plugins so it will be easier to cut down on unneeded ohai data for Windows.

On 6/17/12 3:56 AM, Jeremiah Snapp wrote:
" type="cite">

I'm just adding my two cents to the great suggestions from MC and Sascha.

As KC suggested you want to consider preventing your nodes from converging at the same time to reduce the amount of concurrent requests to the server.

When considering the large amount of windows ohai data you may want to look at a chef thread from may 29 with the subject "Knife search note returning a node".  It mentions disabling a Windows ohai plugin to reduce the amount of content.

Refer to http://wiki.opscode.com/display/chef/Disabling+Ohai+Plugins

On Jun 17, 2012 4:31 AM, "Madhurranjan Mohaan" < "> > wrote:
Thanks yet again for the response!

@KC- Yes, we're running just one thread and the nodes are converging every hour. I'll spike out the unicorn + nginx setup on a new box with Centos 6.2 and get see how that behaves and then probably move these out to that setup. Thanks for the tip!

@Sascha  - Its a mix of Windows 2003 32 bit server and WIn 2003 64 bit mostly. I ain't sure if the sheer amount of ohai data is causing this. Any other parameters I should consider?

Ranjan

On Sun, Jun 17, 2012 at 3:26 AM, Sascha Bates < " target="_blank"> > wrote:
Could it be the number of Windows servers and the astonishing amount of ohai data collected for Windows?  My understanding is that Windows ohai has an awful lot of data. I haven't worked with it in a few months so my memory is fading a bit and I was chef-solo anyway. 120 Windows nodes might produce a lot of data.


On 6/16/12 3:47 PM, KC Braunschweig wrote:
On Sat, Jun 16, 2012 at 12:41 PM, Madhurranjan Mohaan
< " target="_blank"> >  wrote:
Do you think we should scale out ? If yes, what services do you think we
should run on different servers? Also, on my end, I am trying to see if all
Regarding the instability, I can tell you I had issues on RHEL 5.7
because the versions of couchdb and erlang were old. Newer packages
probably would have fixed it, but I upgraded to RHEL 6.1 which also
had newer versions and things were happier. Doesn't sound exactly like
your instability, but worth considering.

Regarding the performance issues, I hope that Josh was joking. 160
nodes is nothing. Are they converging every 30 minutes? Do you have a
reasonable splay? Are your recipes very search heavy? It could be a
lot of things, but I'd start with considering the concurrency on the
server API. Are you running a single Thin process for the API server?
If so, consider running multiple processes with proxy balancer or some
such in front of them. Alternatively switch the server to run in
unicorn with nginx in front of it. I've been happy with unicorn so
far.

I don't think you should be there yet, but 4gb is probably not gonna
be enough forever. Eventually solr will want more heap and you'll need
memory as you add api server workers and couch will take whatever's
left. Which leads back to either adding memory or Josh's point of
splitting components on different servers. That's eventually though,
I'd hope you could get at least a couple hundred nodes with your
current VM and 1000+ with 8gb without too much trouble.

To give you an example, I have a preprod server with about 1000 nodes:
RHEL 6.1 VM
8gb
4 virtual cores
unicorn - 8 api workers, 2 webui workers
solr - 2gb heap
chef 0.10.4

KC

On Sat, Jun 16, 2012 at 7:25 PM, Joshua Timberman< " target="_blank"> >
wrote:
Are you running all the chef server services on one machine? What is the
hardware spec of it? 160 nodes is quite a few. Sounds like you may need to
start scaling out the server and run services on separate systems.




Archive powered by MHonArc 2.6.16.

§