- From: Sean OMeara <
>
- To:
- Subject: [chef] Re: Re: Chef Server Hardware Reqs (was Re: Chef stability?)
- Date: Thu, 18 Nov 2010 04:01:54 -0500
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=KsndeWOQ/yQAYv0MlJOkND6UD3BG/gAeIZ3ey4cTQ5IRy5O+dMo6UU090mYOJfkUZE d49o47GDbmVqLMpeUezg3r3PX6MQng4grO3rwaSQNSdWeSrrym6UUnyMzZmS424hMYxS htONlUH7GiA9hyw4fPrDVrPpHk4BOfxNuJIa4=
thou shalt provide enough ram
-?
On Wed, Nov 17, 2010 at 8:03 PM, Leinartas, Michael
<
>
wrote:
>
FWIW I was running chef-server 0.9.8 (and friends - rabbitmq, couchdb, solr)
>
along with hosting a yum repo and an openvpn endpoint on a rackspace 512MB
>
instance and having similar problems with chef-solr dying quite often once I
>
reached 25 nodes or so. Updating to a 1GB instance solved it and I'm up to
>
50 nodes without trouble so far.
>
>
>
________________________________
>
From: Allan Carroll
>
<
>
>
Reply-To:
>
"
"
>
>
<
>
>
Date: Wed, 17 Nov 2010 18:27:35 -0600
>
To:
>
"
"
>
>
<
>
>
Subject: [chef] Re: Chef Server Hardware Reqs (was Re: Chef stability?)
>
>
That's likely the same problem I'm having. I've been trying to run my Chef
>
server off of a machine with 700GB (EC2 micro instance).
>
>
This begs the larger question: what size of machine is recommended for
>
running Chef? Seems like a pretty beefy system with all the parts running.
>
>
-Allan
>
>
On Nov 17, 2010, at 4:52 PM, Blake Barnett wrote:
>
>
I found that solr would crash reliably if the machine had a shortage of
>
memory. If I increased the RAM allocated to the VM to ~2GB, it behaved much
>
more reliably.
>
>
-Blake
>
>
On Nov 18, 2010, at 4:37 AM, Allan Carroll wrote:
>
>
Whew. That makes it seem tractable. Thanks for helping zero in on this.
>
>
>
Here's what I dug up:
>
>
>
solr-indexer.log has no real clues.
>
>
Lots of these:
>
>
INFO: Indexing node 37192f37-447a-41c7-8480-c048c878743e from chef status
>
error Connection refused - connect(2)}
>
>
and lots of these:
>
>
INFO: Indexing cookbook_version 2bd0feeb-3e32-4bb2-867c-41e0cfa12806 from
>
chef status ok}
>
>
solr.log also doesn't seem to have anything interesting, but here's the last
>
set of output before it went away last time:
>
>
https://gist.github.com/703913
>
>
Here's a typical failure from the server log:
>
>
https://gist.github.com/703912
>
>
>
On Nov 17, 2010, at 11:16 AM, Adam Jacob wrote:
>
>
On Wed, Nov 17, 2010 at 10:09 AM,
>
<
>
>
wrote:
>
>
I've been working the past few days on tweaking my chef scripts to go into
>
production on EC2 and struggling to get anything I feel good about trusting.
>
Chef looks like a great tool with a strong community. I'm hoping that
>
there's
>
some Chef way of looking at the world I haven't been exposed to that you can
>
all enlighten me on.
>
>
I'm running Ubuntu 10.10 on EC2 with the version of chef from the Opscode
>
Lucid
>
repo (0.9.8).
>
>
A few things going on:
>
>
I can't seem to keep chef-solr or chef-solr-indexer from crashing. I keep
>
having to restart them for some reason. I'm using It makes everything feel
>
really flakey, but I'm not convinced that's the only thing I'm running into.
>
>
This is going to be the source of several problems - can you send us a
>
gist of what you get in the logs when these crash?
>
>
Sometimes the webui (and knife) show the status of all the nodes and
>
sometimes
>
it refuses saying that I have no nodes (even though the node list shows
>
there
>
are some there). The error in the logs is only the same 500 internal server
>
error: connection refused that I see for lots of things.
>
>
Those pages both use search - if you are seeing consistent failures of
>
Solr, thats the source of these issues.
>
>
Running chef-client by hand on a machine causes a different result than
>
letting
>
the timer driven version work. Like it forces the client to reevaluate all
>
the
>
data bags and search results and actually apply them.
>
>
In what way? The code paths here are identical for the most part. If
>
you're using data bags and search in the recipes, and you are seeing
>
failures of Solr, I would wager that these differences are actually
>
just a representation of the search service not being stable for you.
>
>
Sometimes the clients get new data/nodes and update everything fine,
>
sometimes
>
they don't.
>
>
Again, if it's data that comes from search, that's your issue.
>
>
Yesterday I started 8 boxes to bring a whole cluster up. On a few of them,
>
Chef
>
just randomly stopped working. Running chef-client by hand finished building
>
the box correctly. One of them built part of a configuration file using data
>
from a node that I had deleted off the Chef server a few hours earlier and
>
then
>
could never get out of that state. Deleting the configuration file and
>
rerunning client fixed it.
>
>
All the symptoms you talk about sound search related - so we should
>
focus there. :)
>
>
Anyway, all of these small, but annoying, little glitches give me a really
>
bad
>
feeling about trusting Chef to manage my production infrastructure. Of the
>
tools I've looked at, it's the most promising.
>
>
Sorry to hear that, but it's been quite stable for us (and for lots of
>
other folks). We'll get you fixed up.
>
>
I'd really like to given the promise of such powerful ability when it works,
>
the time that I've put into it, and the time it will save. Is anyone using
>
Chef
>
at a large scale? Does it take handholding and massaging along the way, and
>
that's just the price for cutting-edge technology that will be solved as the
>
code matures?
>
>
There are people using Chef at the scale of many thousands of systems,
>
and Opscode manages a production multi-tenant infrastructure that is
>
also quite significant, using many of the same components that are in
>
the open source Chef.
>
>
Happy to help - hook us up with the logs.
>
>
Best,
>
Adam
>
>
--
>
Opscode, Inc.
>
Adam Jacob, CTO
>
T: (206) 508-7449 E:
>
>
>
>
>
>
Archive powered by MHonArc 2.6.16.